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POLYNUCLEOTIDES AND POLYPEPTIDES ENCODED THEREBY 
DISTANTLY HOMOLOGOUS TO HEPARANASE 

FIELD AND BACKGROUND OF THE INVENTION 
5 The present invention relates to novel polynucleotides encoding 

polypeptides distantly homologous to heparanase, nucleic acid constructs 
including the polynucleotides, genetically modified cells expressing same, 
recombinant proteins encoded thereby and which may have heparanase or 
other glycosyl hydrolase activity, antibodies recognizing the recombinant 
10 proteins, oligonucleotides and oligonucleotide analogs derived from the 
polynucleotides and ribozymes including same. 

Citation or identification of any reference in this application shall 
not be construed as an admission that such reference is available as prior 
art to the present invention. 
1 5 Glycosaminoglycans ( GA Gs) 

GAGs are polymers of repeated disaccharide units consisting of 
uronic acid and a hexosamine. Biosynthesis of GAGs except hyaluronic 
acid is initiated from a core protein. Proteoglycans may contain several 
GAG side chains from similar or different families. GAGs are synthesized 
20 as homopolymers which may subsequently be modified by N-deacetylation 
and N-sulfation, followed by C5-epimerization of glucuronic acid to 
iduronic acid and O-sulfation. The chemical composition of GAGs from 
various tissues varies highly. 

The natural metabolism of GAGs in animals is carried out by 
25 hydrolysis. Generally, the GAGs are degraded in a two step procedure. 
First the proteoglycans are internalized in endosomes, where initial 
depolymerization of the GAG chain takes place. This step is mainly 
hydrolytic and yields oligosaccharides. Further degradation is carried out 
after fusion with lysosome, where desulfation and exolytic 
30 depolymerization to monosaccharides take place (42). 

The only mammalian GAG degrading endolytic enzymes 
characterized so far are the hyaluronidases. The hyaluronidases are a 
family of 1-4 endoglucosaminidases that depolymerize hyaluronic acid and 
chondroitin sulfate. The cDNAs encoding sperm associated PH-20 
35 (Hyal3), and the lysosomal hyaluronidases Hyal 1 and Hyai2 were cloned 
and published (27). These enzymes share an overall homology of 40 % 
and have different tissue specificities, cellular localizations and PH 
optima. 
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Exolytic hydrolases are better characterized, among which are |5- 
glucoronidase, a-L-iduronidase, and P-N-acetylglucosaminidase. In 
addition to hydrolysis of the glycosidic bond of the polysaccharide chain, 
GAG degradation involves desulfation, which is catalyzed by several 
lysosomal sulfatases such as N-acetylgalactosamine-4-sulfatase, iduronate- 
2-sulfatase and heparin sulfamidase. Deficiency in any of lysosomal GAG 
degrading enzymes results in a lysosomal storage disease, 
mucopolysaccharidosis. 

Glycosyl hydrolases: 

Glycosyl hydrolases are a widespread group of enzymes that 
hydrolyze the o-glycosidic bond between two or more carbohydrates or 
between a carbohydrate and a noncarbohydrate moiety. The enzymatic 
hydrolysis of glycosidic bond occurs by using major one or two 
mechanisms leading to overall retention or inversion of the anomeric 
configuration. In both mechanisms catalysis involves two residues: a 
proton donor and a nucleophile. Glycosyl hydrolyses have been classified 
into 58 families based on amino acid similarities. The glycosyl hydrolyses 
from families 1, 2, 5, 10, 17, 30, 35, 39 and 42 act on a large variety of 
substrates, however, they all hydrolyze the glycosidic bond in a general 
acid catalysis mechanism, with retention of the anomeric configuration. 
The mechanism involves two glutamic acid residues, which are the proton 
donors and the nucleophile, with an aspargine always preceding the proton 
donor. Analyses of a set of known 3D structures from this group revealed 
that their catalytic domains, despite the low level of sequence identity, 
adopt a similar (a/p) 8 fold with the proton donor and the nucleophile 
located at the C-terminal ends of strands P4 and p7, respectively. 
Mutations in the functional conserved amino acids of lysosomal glycosyl 
hydrolases were identified in lysosomal storage diseases. 

Lysosomal glycosyl hydrolases including p-glucuronidase, P- 
manosidase, P-glucocerebrosidase, P-galactosidase and a-L-iduronidase, 
are all exo-glycosyl hydrolases, belong to the GH-A clan and share a 
similar catalytic site. However, many endo-glucanases from v2irious 
organisms, such as bacterial and fungal xylenases and cellulases share this 
catalytic domain ( 1 ). 

Heparan sulfate proteoglycans (HSPGs) 

HSPGs are ubiquitous macromolecules associated with the cell 
surface and extracellular matrix (ECM) of a wide range of cells of 
vertebrate and invertebrate tissues (3-7). The basic HSPG structure 
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consists of a protein core to which several linear heparan sulfate chains are 
covalently attached. The polysaccharide chains are typically composed of 
repeating hexuronic and D-glucosamine disaccharide units that are 
substituted to a varying extent with N- and O-Iinked sulfate moieties and 

5 N-linked acetyl groups (3-7), Studies on the involvement of ECM 
molecules in cell attachment, growth and differentiation revealed a central 
role of HSPGs in embryonic morphogenesis, angiogenesis, metastasis, 
neurite outgrowth and tissue repair (3-7). The heparan sulfate (HS) 
chains, which are unique in their ability to bind a multitude of proteins, 

10 ensure that a wide variety of effector molecules cling to the cell surface (6- 
8). HSPGs are also prominent components of blood vessels (5). In large 
vessels they are concentrated mostly in the intima and inner media, 
whereas in capillaries they are found mainly in the subendothelial 
basement membrane where they support proliferating and migrating 

15 endothelial cells and stabilize the structure of the capillary wall. The 
ability of HSPGs to interact with ECM macromolecules such as collagen, 
laminin and fibronectin, and with different attachment sites on plasma 
membranes suggests a key role for this proteoglycan in the self-assembly 
and insolubility of ECM components, as well as in cell adhesion and 

20 locomotion. Cleavage of HS may therefore result in disassembly of the 
subendothelial ECM and hence may play a decisive role in extravasation 
of normal and malignant blood-bome cells (9-11). HS catabolism is 
observed in inflanmiation, wound repair, diabetes, and cancer metastasis, 
suggesting that enzymes which degrade HS play important roles in 

25 pathologic processes. 

Heparanase 

Heparanase is a glycosylated enzyme that is involved in the 
catabolism of certain glycosaminoglycans. It is an endoglucouronidase 
that cleaves heparan sulfate at specific intrachain sites (12-15). Interaction 

30 of T and B lymphocytes, platelets, granulocytes, macrophages and mast 
cells with the subendothelial extracellular matrix (ECM) is associated with 
degradation of heparan sulfate by heparanase activity (16). Connective 
tissue activating peptide III (CTAP), a c-chemokine, was found to have 
heparanase-like activity. Placenta heparanase acts as an adhesion 

35 molecule or as a degradative enzyme depending on the pH of the 
microenvironvent (17), 

Heparanase is released from intracellular compartments (e.g., 
lysosomes, specific granules) in response to various activation signals 
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(e.g., thrombin, calcium ionophores, immune complexes, antigens and 
mitogens), suggesting its regulated involvement in inflammation and 
cellular immunity responses (16). 

It was also demonstrated that heparanase can be readily released 
5 from human neutrophils by 60 minutes incubation at 4 C in the absence of 
added stimuli (18). 

Gelatinase, another ECM degrading enzyme which is found in 
tertiary granules of human neutrophils with heparanase, is secreted from 
the neutrophils in response to phorbol 12-myristate 13-acetate (PMA) 
10 treatment (19-20). 

In contrast, various tumor cells appear to express and secrete 
heparanase in a constitutive manner in correlation with their metastatic 
potential (21). 

Degradation of heparan sulfate by heparanase results in the release 

15 of heparin-binding growth factors, enzymes and plasma proteins that are 
sequestered by heparan sulfate in basement membranes, extracellular 
matrices and cell surfaces (22-23). 

Heparanase activity has been described in a number of cell types 
including cultured skin fibroblasts, human neutrophils, activated rat T- 

20 lymphocytes, normal and neoplastic murine B-lymphocytes, human 
monocytes and human umbilical vein endothelial cells, SK hepatoma cells, 
human placenta and human platelets. 

A procedure for purification of natural heparanase was reported for 
SK hepatoma cells and human placenta (U,S. Pat No. 5,362,641) and for 

25 human platelets derived enzymes (62). 

Cloning and expression of the heparanase gene 
A purified fraction of heparanase isolated from human hepatoma 
cells was subjected to tryptic digestion. Peptides were separated by high 
pressure liquid chromatography (HPLC) and micro sequenced. The 

30 sequence of one of the peptides was used to screen data bases for 
homology to the corresponding back translated DNA sequence. This 
procedure led to the identification of a clone containing an insert of 1020 
base pairs (bp) which included an open reading frame of 963 bp followed 
by 27 bp of 3' untranslated region and a poly A tail. The new gene was 

35 designated hpa. Cloning of the missing 5' end of hpa was performed by 
Marathon RACE from placenta cDNA composite. The joined hpa cDNA 
(also referred to as ^hpa) fragment contained an open reading frame, 
which encodes a polypeptide of 543 amino acids with a calculated 
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molecular weight of 61,192 daitons (2). The cloning procedures are 
described in length in U.S. Pat. application Nos. 08/922,170,09/109,386, 
and 09/258,892, the latter is a continuation-in-part of PCT/US98/17954, 
filed August 31, 1998, all of which are incorporated herein by reference. 
5 The genomic locus which encodes heparanase spans about 40 kb. It 

is composed of 12 exons separated by 11 introns and is localized on 
human chromosome 4. 

The ability of the hpa gene product to catalyze degradation of 
heparan sulfate (HS) in vitro was examined by expressing the entire open 
10 reading frame of hpa in High five and Sf21 insect cells, and the 
mammalian human 293 embryonic kidney cell line expression systems. 
Extracts of infected or transfected cells were assayed for heparanase 
catalytic activity. For this purpose, cell lysates were incubated with sulfate 
labeled, ECM-derived HSPG (peak I), followed by gel filtration analysis 
15 (Sepharose 6B) of the reaction mixture. While the substrate alone 
consisted of high molecular weight material, incubation of the HSPG 
substrate with lysates of cells infected or transfected with hpa containing 
vectors resulted in a complete conversion of the high molecular weight 
substrate into low molecular weight labeled heparan sulfate degradation 
20 fragments (see, for example, U.S. Pat. application No. 09/071,618, which 
is incorporated herein by reference. 

In other experiments, it was demonstrated that the heparanase 
enzyme expressed by cells infected with a pPhpa virus is capable of 
degrading HS complexed to other macromolecular constituents (e.g., 
25 fibronectin, laminin, collagen) present in a naturally produced intact ECM 
(see U.S. Pat. application No. 09/109,386, which is incorporated herein by 
reference), in a manner similar to that reported for highly metastatic tumor 
cells or activated cells of the immune system (7, 8). 

Preferential expression of the hpa gene in human breast and 
30 hepatocellular carcinomas 

Semi-quantitative RT-PCR was applied to evaluate the expression 
of the hpa gene by human breast carcinoma cell lines exhibiting different 
degrees of metastasis. A marked increase in hpa gene expression is 
observed which correlates to metastatic capacity of non-metastatic MCF-7 
35 breast carcinoma, moderately metastatic MDA 231 and highly metastatic 
MDA 435 breast carcinoma cell lines. Significantly, the differential 
pattern of the hpa gene expression correlated with the pattern of 
heparanase activity. 
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Expression of the hpa gene in human breast carcinoma was 
demonstrated by in situ hybridization to archival paraffin embedded 
human breast tissue. Hybridization of the heparanase antisense riboprobe 
to invasive duct carcinoma tissue sections resulted in a massive positive 
5 staining localized specifically to the carcinoma cells. The hpa gene was 
also expressed in areas adjacent to the carcinoma showing fibrocystic 
changes. Normal breast tissue derived from reduction mammoplasty failed 
to express the hpa transcript. High expression of the hpa gene was also 
observed in tissue sections derived from human hepatocellular carcinoma 
10 specimens but not in normal adult liver tissue. Furthermore, tissue 
specimens derived from adenocarcinoma of the ovary, squamous cell 
carcinoma of the cervix and colon adenocarcinoma exhibited strong 
staining with the hpa RNA probe, as compared to a very low staining of 
the hpa mRNA in the respective non-malignant control tissues (2). ^ 

15 A preferential expression of heparanase in human tumors versus the 

corresponding normal tissues was also noted by immunohistochemical 
staining of paraffin embedded sections with monoclonal anti-heparanase 
antibodies. Positive cytoplasmic staining was found in neoplastic cells of 
the colon carcinoma and in dysplastic epithelial cells of a tubulovillous 

20 adenoma found in the same specimen while there was little or no staining 
of the normal looking colon epithelium located away from the carcinoma. 
Of particular significance was an intense immunostaining of colon 
adenocarcinoma cells that had metastasized into the liver, as compared to 
the surrounding normal liver tissue. 

25 Latent and active forms of the heparanase protein 

The apparent molecular size of the recombinant enzyme produced 
in the baculovirus expression system was about 65 kDa. This heparanase 
polypeptide contains 6 potential N-glycosylation sites. Following 
deglycosylation by treatment with peptide N-glycosidase, the protein 

30 appeared as a 57 kDa band. This molecular weight corresponds to the 
deduced molecular mass (61,192 daltons) of the 543 amino acid 
polypeptide encoded by the full length hpa cDNA after cleavage of the 
predicted 3 kDa signal peptide. No further reduction in the apparent size 
of the N-deglycosylated protein was observed following concurrent O- 

35 glycosidase and neuraminidase treatment. Deglycosylation had no 
detectable effect on enzymatic activity. 

Unlike the baculovirus enzyme, expression of the full length 
heparanase polypeptide in mammalian cells (e.g., 293 kidney cells, CHO) 
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yielded a major protein of about 50 kDa and a minor about 65 kDa protein 
in cell lysates. Preferential release of the about 65 kDa form into the 
culture medium was noted in some of the transfected CHO clones. 
Comparison of the enzymatic activity of the two forms, using a semi- 

5 quantitative gel filtration assay, revealed that the 50 kDa enzyme is about 
100- fold more active than the 65 kDa form. A similar difference was 
observed when the specific activity of the recombinant 65 kDa baculovirus 
enzyme was compared to that of the 50 kDa heparanase preparations 
purified from human platelets, SK-hep-1 cells, or placenta. These results 

10 suggest that the 50 kDa protein is a mature processed form of a latent 
heparanase precursor. Amino terminal sequencing of the platelet 
heparanase indicated that cleavage occurs between amino acids glu^^?. 
lys^^S As indicated by the hydropathic plot of heparanase, this site is 
located within a hydrophillic peak which is likely to be exposed and hence 

1 5 accessible to proteases. 

Involvement of Heparanase in Tumor Cell Invasion and 
Metastasis 

Circulating tumor cells arrested in the capillary beds often attach at 
or near the intercellular junctions between adjacent endothelial cells. Such 

20 attachment of the metastatic cells is followed by rupture of the junctions, 
retraction of the endothelial cell borders and migration through the breach 
in the endothelium toward the exposed underlying base membrane (BM) 
(24). Once located between endothelial cells and the BM, the invading 
cells must degrade the subendothelial glycoproteins and proteoglycans of 

25 the BM in order to migrate out of the vascular compartment. Several 
cellular enzymes (e.g., collagenase IV, plasminogen activator, cathepsin B, 
elastase, etc.) are thought to be involved in degradation of BM (25). 
Among these enzymes is heparanase that cleaves HS at specific intrachain 
sites (16, 11). Expression of a HS degrading heparanase was found to 

30 correlate with the metastatic potential of mouse lymphoma (26), 
fibrosarcoma and melanoma (21) cells. Moreover, elevated levels of 
heparanase were detected in sera from metastatic tumor bearing animals 
and melanoma patients (21) and in tumor biopsies of cancer patients (12). 
The inhibitory effect of various non-anticoagulant species of 

35 heparin on heparanase was examined in view of their potential use in 
preventing extravasation of blood-borne cells. Treatment of experimental 
animals with heparanase inhibitors markedly reduced (> 90 %) the 
incidence of lung metastases induced by B16 melanoma, Lewis lung 
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carcinoma and mammary adenocarcinoma cells (12, 13, 28). Heparin 
fractions with high and low affinity to anti-thrombin III exhibited a 
comparable high anti-metastatic activity, indicating that the heparanase 
inhibiting activity of heparin, rather than its anticoagulant activity, plays a 

5 role in the anti-metastatic properties of the polysaccharide (12). 

The direct role of heparanase in cancer metastasis was 
demonstrated by two experimental systems. The murine T-lymphoma cell 
line Eb has no detectable heparanase activity. Whether the introduction of 
the hpa gene into Eb cells would confer a metastatic behavior on these 

10 cells was investigated. To this purpose, Eb cells were transfected with a 
full length human hpa cDNA. Stable transfected cells showed high 
expression of the heparanase mRNA and enzyme activity. These hpa and 
mock transfected Eb cells were injected subcutaneously into DBA/2 mice 
and mice were tested for survival time and liver metastases. All mice 

15 (n=20) injected with mock transfected cells survived during the first 4 
weeks of the experiment, while 50% mortality was observed in mice 
inoculated with Eb cells transfected with the hpa cDNA. The liver of mice 
inoculated with hpa transfected cells was infiltrated with numerous Eb 
lymphoma cells, as was evident both by macroscopic evaluation of the 

20 liver surface and microscopic examination of tissue sections. In contrast, 
metastatic lesions could not be detected by gross examination of the liver 
of mice inoculated with mock transfected control Eb cells. Few or no 
lymphoma cells were found to infiltrate the liver tissue. In a different 
model of tumor metastasis, transient transfection of the heparanase gene 

25 into low metastatic B16-F1 mouse melanoma cells followed by i.v. 
inoculation, resulted in a 4- to 5-fold increase in lung metastases. 

Finally, heparanase externally adhered to B16-F1 melanoma cells 
increased the level of lung metastases in C57BL mice as compared to 
control mice (see U.S. Pat. application No. 09/260,037, entitled 

30 INTRODUCING A BIOLOGICAL MATERIAL INTO A PATIENT, 
which is a continuation in part of U.S. Pat. application No. 09/140,888, 
and is incorporated herein by reference. 

Possible involvement of heparanase in tumor angiogenesis 
Fibroblast growth factors are a family of structurally related 

35 polypeptides characterized by high affinity to heparin (29). They are 
highly mitogenic for vascular endothelial cells and are among the most 
potent inducers of neovascularization (29-30). Basic fibroblast growth 
factor (bFGF) has been extracted from a subendothelial ECM produced in 
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vitro (31) and from basement membranes of the cornea (32), suggesting 
that ECM may serve as a reservoir for bFGF. Immunohistochemical 
staining revealed the locaHzation of bFGF in basement membranes of 
diverse tissues and blood vessels (23). Despite the ubiquitous presence of 

5 bFGF in normal tissues, endothelial cell proliferation in these tissues is 
usually very low, suggesting that bFGF is somehow sequestered from its 
site of action. Studies on the interaction of bFGF with ECM revealed that 
bFGF binds to HSPG in the ECM and can be released in an active form by 
HS degrading enzymes (33, 32, 34). It was demonstrated that heparanase 

10 activity expressed by platelets, mast cells, neutrophils, and lymphoma cells 
is involved in release of active bFGF from ECM and basement membranes 
(35), suggesting that heparanase activity may not only function in cell 
migration and invasion, but may also elicit an indirect neovascular 
response. These results suggest that the ECM HSPG provides a natural 

15 storage depot for bFGF and possibly other heparin-binding growth 
promoting factors (36,37). Displacement of bFGF from its storage within 
basement membranes and ECM may therefore provide a novel mechanism 
for induction of neovascularization in normal and pathological situations. 

Recent studies indicate that heparin and HS are involved in binding 

20 of bFGF to high affinity cell surface receptors and in bFGF cell signaling 
(38, 39). Moreover, the size of HS required for optimal effect was similar 
to that of HS fragments released by heparanase (40). Similar results were 
obtained with vascular endothelial cells growth factor (VEGF) (41), 
suggesting the operation of a dual receptor mechanism involving HS in 

25 cell interaction with heparin-binding growth factors. It is therefore 
proposed that restriction of endothelial cell growth factors in ECM 
prevents their systemic action on the vascular endothelium, thus 
maintaining a very low rate of endothelial cells tumover and vessel 
growth. On the other hand, release of bFGF from storage in ECM as a 

30 complex with HS fragment, may elicit localized endothelial cell 
proliferation and neovascularization in processes such as wound healing, 
inflammation and tumor development (36,37), 

The involvement of heparanase in other physiological processes 
and its potential therapeutic applications 

35 Apart from its involvement in tumor cell metastasis, inflammation 

and autoimmunity, mammalian heparanase may be applied to modulate 
bioavailability of heparin-binding growth factors; cellular responses to 
heparin-binding grov^h factors (e.g., bFGF, VEGF) and cytokines (IL-8) 
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(44, 41); cell interaction with plasma lipoproteins (49); cellular 
susceptibility to certain viral and some bacterial and protozoa infections 
(45-47); and disintegration of amyloid plaques (48). 

Viral Infection: The presence of heparan sulfate on cell surfaces 
5 have been shown to be the principal requirement for the binding of Herpes 
Simplex (45) and Dengue (46) viruses to cells and for subsequent infection 
of the cells. Removal of the cell surface heparan sulfate by heparanase 
may therefore abolish virus infection. In fact, treatment of cells with 
bacterial heparitinase (degrading heparan sulfate) or heparinase (degrading 
10 heparan) reduced the binding of two related animal herpes viruses to cells 
and rendered the cells at least partially resistant to virus infection (45). 
There are some indications that the cell surface heparan sulfate is also 
involved in HIV infection (47). 

Neurodegenerative diseases: Heparan sulfate proteoglycans were 
15 identified in the prion protein amyloid plaques of Genstmann-Straussler 
Syndrome, Creutzfeldt- Jakob disease and Scrape (48). Heparanase may 
disintegrate these amyloid plaques which are also thought to play a role in 
the pathogenesis of Alzheimer's disease. 

Restenosis and Atlierosclerosis: Proliferation of arterial smooth 
20 muscle cells (SMCs) in response to endothelial injury and accumulation of 
cholesterol rich lipoproteins are basic events in the pathogenesis of 
atherosclerosis and restenosis (50). Apart from its involvement in SMC 
proliferation as a low affinity receptor for heparin-binding growth factors, 
HS is also involved in lipoprotein binding, retention and uptake (51). It 
25 was demonstrated that HSPG and lipoprotein lipase participate in a novel 
catabolic pathway that may allow substantial cellular and interstitial 
accumulation of cholesterol rich lipoproteins (49). The latter pathway is 
expected to be highly atherogenic by promoting accumulation of apoB and 
apoE rich lipoproteins (e.g., LDL, VLDL, chylomicrons), independent of 
30 feed back inhibition by the cellular cholesterol content. Removal of SMC 
HS by heparanase is therefore expected to inhibit both SMC proliferation 
and lipid accumulation and thus may halt the progression of restenosis and 
atherosclerosis. 

Pulmonary diseases: 
35 The data obtained from the literature suggests a possible role for 

GAGs degrading enzymes, such as, but not limited to, heparanases, 
connective tissue activating peptide, heparinases, hyluronidases, sulfatases 
and chondroitinases, in reducing the viscosity of sinuses and airway 
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secretions with associated implications on curtailing the rate of infection 
and inflammation. The sputum from CF patients contains at least 3 % 
GAGs, thus contributing to its volume and viscous properties. 
Recombinant heparanase has been shown to reduce viscosity of sputum of 

5 CF patients (see, U.S, Pat. application No. 09/046,475). 

In summary, heparanase may thus prove useful for conditions such 
as wound heaUng, angiogenesis, restenosis, atherosclerosis, inflammation, 
neurodegenerative diseases and viral infections. Mammalian heparanase 
can be used to neutralize plasma heparin, as a potential replacement of 

10 protamine. Anti-heparanase antibodies may be applied for 
immunodetection and diagnosis of micrometastases, autoimmune lesions 
and renal failure in biopsy specimens, plasma samples, and body fluids. 

There is thus a widely recognized need for, and it would be highly 
advantageous to have, additional molecules with glycosyl hydrolase 

15 activity, because such molecules may exhibit greater specific activity 
toward certain substrates or different substrate specificity than the known 
heparanase. 

SUMMARY OF THE INVENTION 

20 According to one aspect of the present invention there is provided 

an isolated nucleic acid comprising a polynucleotide hybridizable with 
SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x 
Denharts, 10 % dextran sulfate, 100 (ig/ml salmon sperm DNA, and 32p 
labeled probe and wash at 68 with 3 x SSC and 0.1 % SDS. 

25 According to another aspect of the present invention there is 

provided an isolated nucleic acid comprising a polynucleotide hybridizable 
with SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % 
SDS, 5 x Denharts, 10 % dextran sulfate, 100 [xg/ml salmon sperm DNA, 
and 32p labeled probe and wash at 68 ^C with 1 x SSC and 0.1 % SDS. 

30 According to still another aspect of the present invention there is 

provided an isolated nucleic acid comprising a polynucleotide hybridizable 
with SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % 
SDS, 5 X Denharts, 10 % dextran sulfate, 100 [ig/m\ salmon sperm DNA, 
and 32p labeled probe and wash at 68 ^C with 0.1 x SSC and 0.1 % SDS. 

35 According to yet another aspect of the present invention there is 

provided an isolated nucleic acid comprising a polynucleotide at least 60 
% identical with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software 
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package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

According to still another aspect of the present invention there is 
5 provided an isolated nucleic acid comprising a polynucleotide encoding a 
polypeptide being at least 60 % homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of the DNA 
sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin (gap creation penalty - 50, 
10 gap extension penalty - 3). 

According to further features in preferred embodiments of the 
invention described below, the polynucleotide is as set forth in SEQ ID 
NOs:l, 4, 6 or portions thereof- 
According to an additional aspect of the present invention there is 
15 provided a recombinant protein comprising a polypeptide encoded by the 
polynucleotides herein described. 

According to yet an additional aspect of the present invention there 
is provided a recombinant protein comprising a polypeptide at least 60 % 
homologous with SEQ ID NOs:3, 5, 7 or portions thereof as determined 
20 using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

According to fiirther features in preferred embodiments of the 
25 invention described below, the polypeptide is as set fourth in SEQ ID 
NOs:3, 5, 7 or portions thereof. 

According to still an additional aspect of the present invention there 
is provided a nucleic acid construct comprising the isolated nucleic acid 
herein described. 

30 According to a further aspect of the present invention there is 

provided a nucleic acid construct comprising a polynucleotide encoding 
the recombinant protein herein described. 

According to still a further aspect of the present invention there is 
provided a host cell comprising a polynucleotide or construct and/or 
35 expressing a recombinant protein as herein described. 

According to yet a further aspect of the present invention there is 
provided an antisense oligonucleotide or nucleic acid construct comprising 
a polynucleotide or a polynucleotide analog of at least 10 bases being 
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hybridizable in vivo, under physiological conditions, with (i) a portion of 
a polynucleotide strand encoding a polypeptide at least 60 % homologous 
with SEQ ID NOs:3, 5, 7 or portions thereof as determined using the 
Bestfit procedure of the DNA sequence analysis software package 

5 developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 3); or (ii) a 
portion of a polynucleotide strand at least 60 % identical with SEQ ID 
NOs:l, 4, 6 or portions thereof as determined using the Bestfit procedure 
of the DNA sequence analysis software package developed by the Genetic 

10 Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 50, gap extension penalty - 3). 

According to another aspect of the present invention there is 
provided a ribozyme comprising the antisense oligonucleotide herein 
described and a ribozyme sequence. 

15 The present invention provides polynucleotides and polypeptides 

belonging to a class of asp-glu glycosyl hydrolases of the GH-A clan, 
probably, based on homology to heparanase, GAG degrading enzymes. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention is herein described, by way of example only, with 
reference to the accompanying drawings, wherein: 

FIG. 1 shows the nucleotide sequence (SEQ ID NOs:l-2) and the 
deduced amino acid sequence (SEQ ID NOs:2-3) of hnhpJ; 

FIG. 2 is a comparison of the deduced amino acid sequences of 
hnhpJ (SEQ ID NOs:2-3) and of heparanase (SEQ ID NO:9). Comparison 
was performed using the Gap program of the GCG package (gap creation 
penalty - 50, gap extension penalty - 3); 

FIG. 3 illustrates variability of hnhpl transcripts. Hnhpl was 
amplified from placenta and from testis marathon ready cDNA libraries, 
using the gene specific primers pn9-312u (SEQ ID NO:14) and hnll-230 
(SEQ ID NO: 11); 

FIG. 4 shows a zoo blot. Ten micrograms of genomic DNA from 
various species were digested with EcoRl and separated on 0.7 % agarose 
- TBE gel. Following electrophoresis, the gel was treated with HCl and 
then with NaOH and the DNA fragments were downward transferred to a 
nylon membrane (Hybond N+, Amersham) with 0.4 N NaOH. The 
membrane was hybridized with a 1.7 Kb DNA probe that contained the 
hnhpl cDNA (clone pn9). Lane order: H - Human; M - Mouse; Rt - Rat; P 



25 



30 
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- Pig; Cw - Cow; Hr - Horse; S - Sheep; Rb - Rabbit; D - Dog; Ch - 

Chicken; F - Fish. Size markers (Lambda BstcU) are shown on the left; 

FIG. 5 illustrates cross hybridization between hpa and hnhpL Hpa 

was amplified by PCR from marathon ready placenta cDNA library, 

5 Hnhpl was amplified from testis marathon ready cDNA library. PCR 

products were run on agarose gel in duplicates and transferred to a nylon 

32 

membrane. One membrane was probed with p labeled hpa cDNA and 

the other with hnhpl ^ clone pn9. 

FIG. 6 is a comparison of the hydropathic profiles of heparanase 
10 and hnhpl. The curves were calculated according to the Kyte and Dulittle 

method over a window of 17 amino acids. 

FIG. 7 shows a Western blot analysis of recombinant hnhpl 

expressed in human embryonal kidney 293 cells. A - control heparanase- 

FLAG precursor, B-D - 293 cells trasfected with a control pSI vector (B), 
15 pSI-pn6 (C) and pSI-pn9 (D). Cell extracts were separated by SDS- 

PAGE, transferred onto Immobilon-P nylon membrane (Millipore). 

Membrane was incubated with anti-FLAG Flag antibody 1:1000 (Kodak 

anti Flag M2 cat: IB 13025). 

20 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention is of novel polynucleotides encoding 
polypeptides distantly homologous to heparanase, nucleic acid constructs 
including the polynucleotides, genetically modified cells expressing same, 
recombinant proteins encoded thereby and which may have heparanase or 

25 other glycosyl hydrolase activity, antibodies recognizing the recombinant 
proteins, oligonucleotides and oligonucleotide analogs derived from the 
polynucleotides and ribozymes including same. 

The principles and operation of the present invention may be better 
understood with reference to the drawings and accompanying descriptions. 

30 Before explaining at least one embodiment of the invention in 

detail, it is to be understood that the invention is not limited in its 
application to the details of construction and the arrangement of the 
components set forth in the following description or illustrated in the 
drawings. The invention is capable of other embodiments or of being 

35 practiced or carried out in various ways. Also, it is to be understood that 
the phraseology and terminology employed herein is for the purpose of 
description and should not be regarded as limiting. 
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While reducing the present invention to practice the human EST 
database was screened for homologous sequences using the entire amino 
acid sequence of human heparanase (SEQ ID NO:9). A distantly 
homologous fragment was pooled out, accession number AI222323, 
IMAGE clone number 1843155 from Soares_NFL_T_GBC_Sl Homo 
Sapiens cDNA library prepared from testis B-cells and fetal lungs. The 
clone contained an insert of 560 bp (SEQ ID NO:23) of which the 3' 
region was homologous to the human hpa gene encoding human 
heparanase. Primers derived from the newly identified clone were used to 
isolate several cDNAs including several open reading frames which reflect 
in frame alternative splicing, the longest of which, pn6, appears in Figure 
1 (SEQ ID NOs:l, 2 and 3) is 2060 nucleotide long and it contains an open 
reading frame of 1776 nucleotides, which encodes a polypeptide of 592 
amino acids, with a calculated molecular weight of 66.5 kDa. The newly 
cloned gene was designated hnhpL Two shorter forms, pn9 and pn5 and 
their deduced amino acid sequences are set forth in SEQ ID NOs:4 and 6 
and SEQ ID NO:5 and 7, respectively, and are further described in the 
Examples section that follows. Comparison between the amino acid 
sequence of hnhpl and heparanase is shown in Figure 3. The homology 
between the two proteins is 52.8 % or 55.3 %, depending on the software 
employed. No cross hybridzation was detected between hpa and hnhpl, 
even under very moderate wash conditions (Figure 5). Zoo blot analysis 
demonstrated that the hnhpl gene and other related genes, perhaps forming 
a new gene familly, are present in genomes of other organisms including 
mammals and avians. The chromosome localization of hnhpl was 
determined using G3 radiation hybrid panel to be on human chromosome 
10, next to the marker SHGC-57721. The results also indicated a 
possibility of a second copy of the gene or of a related gene. The hnhpl 
gene is expressed in low levels in lymph nodes, spleen, colon and ovary; in 
slightly higher levels in prostate and small intestine; and in yet more 
pronouced level in testis. No expression was detected under the assay 
employed in bone marrow, liver, thymus, tonsil or leukocytes. Screening 
of the mouse EST database with the amino acid sequence of heparanase as 
well as of hnhpl pooled out a mouse EST clone (clone 1378452 accession 
number AI019269 from mouse thymus, SEQ ID NO:8). However, this 
clone includes two frame shift mutations which hamper its open reading 
frame. 
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The overall homology between the amino acid sequence of hnhpl 
and heparanase suggest that these two proteins share similar function. The 
homology between the two proteins is concentrated at several regions. 
These may represent functional domains of the protein. The variability 
may suggest potential difference in substrate recognition, cellular 
localization and parameters of activity. 

Despite the lack of an overall homology between the heparanase 
and other glycosyl hydrolases, the amino acid couple asp-glu (NE, SEQ ID 
NO: 13), which is characteristic of the proton donor of glycosyl hydrolyses 
of the GH-A clan, was found at positions 224, 225 of heparanase. As in 
other clan members, this NE couple is located at the end of a p strand. As 
shown in Figure 2, the region surrounding the NE couple is conserved in 
the predicted amino acid sequence of hnhpl. This suggests that hnhpl 
product is a glycosyl hydrolase. This definition may include any 
polysaccharide degrading enzyme, either exo or endo glycosidase and 
based on the similarity to heparanase it is likely that it encodes a GAG 
degrading enzyme. 

In addition, superimposition of the hydropathic profiles of 
heparanase and hnhpl (Figure 6) indicates an overlapping pattern along 
the proteins. The amino acid sequence characteristic of glycosyl 
hydrolases is located within a hydrophilic peak and at the same position in 
the aligned proteins. A remarkable difference in the hydropathic pattern is 
noticed around amino acids 157, 158 of heparanase, which constitute the 
processing site of the enzyme. While in heparanase, this site is located at 
the tip of a hydrophilic peak, the equivalent region of hnhpl is rather not 
hydrophilic. The peak around amino acid 1 10 of heparanase appears also, 
around amino acid 130 of hnhpl. Cleavage of heparanase at this region 
was shown to result in enzyme activation. The equivalent region of hnhpl 
might be a potential processing site. 

Heparanase has a potential signal peptide at the N-terminus of the 
67 kDa form. The homology between the two proteins is low at the N- 
termini and no signal peptide was identified in hnhpl polypeptide. 

According to one aspect of the present invention there is provided 
an isolated nucleic acid comprising a polynucleotide hybridi2:able with 
SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x 
Denharts, 10 % dextran sulfate, 100 |ig/ml salmon sperm DNA, and 32p 
labeled probe and wash at 68 °C with 3 x SSC, 1 x SSC or 0,1 x SSC and 
0.1 %SDS. 
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As used herein in the specification and in the claims section that 
follows, the term "portion" or "portions" refer to a consequtive stretch of 
nucleic or amino acids. Such a portion may include, for example, at least 
90 nucleotides (equivalent to at least 30 amino acids), at least 120 

5 nucleotides (equivalent to at least 40 amino acids), at least 1 50 nucleotides 
(equivalent to at least 50 amino acids), at least 1 80 nucleotides (equivalent 
to at least 60 amino acids), at least 210 nucleotides (equivalent to at least 
70 amino acids), at least 300 nucleotides (equivalent to at least 100 amino 
acids), at least 600 nucleotides (equivalent to at least 200 amino acids), at 

10 least 900 nucleotides (equivalent to at least 300 amino acids), at least 
1,200 nucleotides (equivalent to at least 400 amino acids), at least 1,500 
nucleotides (equivalent to at least 500 amino acids), or more. 

According to another aspect of the present invention there is 
provided an isolated nucleic acid comprising a polynucleotide at least 60 

15 %, preferably at least 65 %, more preferably at least 70 %, still preferably 
at least 75 %, yet preferably at least 80 %, more preferably at least 85 %, 
more preferably at least 90 %, most preferably at least 95 % - 100 %, 
identical with SEQ ID NOs:l, 4, 6 or portions thereof as determined using 
the Bestfit procedure of the DNA sequence analysis software package 

20 developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 3), 

According to still another aspect of the present invention there is 
provided an isolated nucleic acid comprising a polynucleotide encoding a 
polypeptide being at least 60 %, preferably at least 65 %, more preferably 

25 at least 70 %, still preferably at least 75 %, yet preferably at least 80 %, 
more preferably at least 85 %, more preferably at least 90 %, most 
preferably at least 95 % - 100 %, homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of the DNA 
sequence analysis software package developed by the Genetic Computer 

30 Group (GCG) at the university of Wisconsin (gap creation penalty - 50, 
gap extension penalty - 3). 

As used herein in the specification and in the claims section that 
follows, the term "homologous" refers to identical + similar. 

According to an additional aspect of the present invention there is 

35 provided a recombinant protein comprising a polypeptide encoded by the 
polynucleotides herein described. 
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The necleic acid according to the present invention can be a 
complementary polynucleotide sequence, genomic polynucleotide 
sequence or a composite polynucleotide sequence. 

As used herein the phrase "complementary polynucleotide 

5 sequence" includes sequences which originally result from reverse 
transcription of messenger RNA using a reverse transcriptase or any other 
RNA dependent DNA polymerase. Such sequences can be subsequently 
amplified in vivo or in vitro using a DNA dependent DNA polymerase. 

As used herein the phrase "genomic polynucleotide sequence" 

10 includes sequences which originally derive from a chromosome and reflect 
a contiguous portion of a chromosome. 

As used herein the phrase "composite polynucleotide sequence" 
includes sequences which are at least partially complementary and at least 
partially genomic. A composite sequence can include some exonal 

15 sequences required to encode a polypeptide, as well as some intronic 
sequences interposing therebetween. The intronic sequences can be of any 
source, including of other genes, and typically will include conserved 
splicing signal sequences. Such intronic sequences may further include cis 
acting expression regulatory elements. 

20 Thus, this aspect of the present invention encompasses (i) 

polynucleotides as set forth in SEQ ID NOs:l, 4 and 6; (ii) fragments or 
portions thereof; (iii) sequences hybridizable therewith; (iv) sequences 
homologous thereto; (v) genomic and composite sequences coresponding 
thereto; (vi) sequences encoding similar polypeptides with different codon 

25 usage; and (vii) altered sequences characterized by mutations, such as 
deletion, insertion or substitution of one or more nucleotides, either 
naturally occurring or man induced, either randomly or in a targeted 
fashion. 

According to yet an additional aspect of the present invention there 
30 is provided a recombinant protein comprising a polypeptide at least 60 %, 
preferably at least 65 %, more preferably at least 70 %, still preferably at 
least 75 %, yet preferably at least 80 %, more preferably at least 85 %, 
more preferably at least 90 %, most preferably at least 95 % - 100 %, 
homologous with SEQ ID NOs:3, 5, 7 or portions thereof, as determined 
35 using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 
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According to still an additional aspect of the present invention there 
is provided a nucleic acid construct comprising the isolated nucleic acid 
herein described. 

According to a preferred embodiment of the present invention the 

5 nucleic acid construct further comprising a promoter for regulating the 
expression of the isolated nucleic acid in a sense or antisense orientation. 
Such promoters are known to be c/.s-acting sequence elements required for 
transcription as they serve to bind DNA dependent RNA polymerase 
which transcribes sequences present downstream thereof. Such down 

10 stream sequences can be in either one of two possible orientations to result 
in the transcription of sense RNA which is translatable by the ribozyme 
machinery or antisense RNA which typically does not contain translatable 
sequences, yet can duplex or triplex with endogenous sequences, either 
mRNA or chromosomal DNA and hamper gene expression, all as further 

15 detailed hereinunder. 

While the isolated nucleic acid described herein is an essential 
element of the invention, it is modular and can be used in different 
contexts. The promoter of choice that is used in conjunction with this 
invention is of secondary importance, and will comprise any suitable 

20 promoter. It will be appreciated by one skilled in the art, however, that it 
is necessary to make sure that the transcription start site(s) will be located 
upstream of an open reading frame. In a preferred embodiment of the 
present invention, the promoter that is selected comprises an element that 
is active in the particular host cells of interest. These elements may be 

25 selected from transcriptional regulators that activate the transcription of 
genes essential for the survival of these cells in conditions of stress or 
starvation, including, but not limited to, the heat shock proteins. 

A construct according to the present invention preferably further 
includes an appropriate selectable marker. In a more preferred 

30 embodiment according to the present invention the construct further 
includes an origin of replication. In another most preferred embodiment 
according to the present invention the construct is a shuttle vector, which 
can propagate both in E. coli (wherein the construct comprises an 
appropriate selectable marker and origin of replication) and be compatible 

35 for propagation in cells, or integration in the genome, of an organism of 
choice. The construct according to this aspect of the present invention can 
be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a 
virus or an artificial chromosome. 
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Alternatively, the nucleic acid construct according to this aspect of 
the present invention further includes a positive and a negative selection 
markers and may therefore be employed for selecting for homologous 
recombination events, including, but not limited to, homologous 

5 recombination employed in knock-in and knock-out procedures. One 
ordinarily skilled in the art can readily design a knock-out or knock-in 
constructs including both positive and negative selection genes for 
efficiently selecting transfected embryonic stem cells that underwent a 
homologous recombination event with the construct. Such cells can be 

10 introduced into developing embryos to generate chimeras, the offspring 
thereof can be tested for carrying the knock-out or knock-in constructs. 
Knock-out and/or knock-in constructs according to the present invention 
can be used to further investigate the functionality of the new gene. Such 
constructs can also be used in somatic and/or germ cells gene therapy to 

15 destroy activity of a defective, gain of function allele or to replace the lack 
of activity of a silent allele in an organism, thereby to down or upregulate 
activity, as required. Further detail relating to the construction and use of 
knock-out and knock-in constructs can be found in Fukushige, S. and 
Ikeda, J.E.: Trapping of mammalian promoters by Cre-lox site-specific 

20 recombination. DNA Res 3 (1996) 73-80; Bedell, M.A., Jenkins, N.A. and 
Copeland, N.G.: Mouse models of human disease. Part I: Techniques and 
resources for genetic analysis in mice. Genes and Development 11 (1997) 
1-11; Bermingham, J J., Scherer, S.S., O'Connell, S., Arroyo, E., Kalla, 
K.A., Powell, FX, and Rosenfeld, M.G.: Tst-l/Oct-6/SCIP regulates a 

25 unique step in peripheral myelination and is required for normal 
respiration. Genes Dev 10 (1996) 1751-62, which are incorporated herein 
by reference. 

According to yet another aspect of the present invention there is 
provided a host cell or animal comprising a nucleic acid construct or a 

30 portion thereof as described herein. Methods of transforming host cells, 
both prokaryotes and eukaryotes, and organisms with nucleic acid 
constmcts and selection of transformants (e.g., transformed cells or 
transgenic animals) are well known to those of skills in the art. In 
addition, once transfected, such cells and organisms can be designed to 

35 direct the production of ample amounts of a recombinant protein which 
can then be purfied by known methods, including, but not limited to, 
various chromatography and gel electrophoresis methods. Such a purified 
recombinant protein can serve for elicitation of antibodies as further 
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detailed hereinunder. Methods of transformation of cells and organism are 
described in detail in reference 43, whereas methods of recombinant 
protein purification are described in detail in reference 52, both are 
incorporated herein by reference. 
5 According to still another aspect of the present invention there is 

provided an oligonucleotide of at least 17, at least 18, at least 19, at least 
20, at least 22, at least 25, at least 30 or at least 40, bases specifically 
hybridizable with the isolated nucleic acid described herein. 

Hybridization of shorter nucleic acids (below 200 bp in length, e.g. 
10 17-40 bp in length) is effected by stringent, moderate or mild 
hybridization, wherein stringent hybridization is effected by a 
hybridizafion solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M 
sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 ^g/ml 
denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 
15 temperature of 1 - 1.5 °C below the Tm, final wash solution of 3 M 
TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % 
SDS at 1 - 1.5 °C below the Tm; moderate hybridization is effected by a 
hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M 
sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 ^g/ml 
20 denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 
temperature of 2 - 2.5 ''C below the Tm, final wash solution of 3 M 
TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0,5 % 
SDS at 1 - 1.5 °C below the Tm, final wash solution of 6 x SSC, and final 
wash at 22 "^C; whereas mild hybridization is effected by a hybridization 
25 solution of 6 X SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium 
phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 ^tg/ml 
denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 
temperature of 37 ""C, final wash solution of 6 x SSC and final wash at 22 
^C. 

30 According to an additional aspect of the present invention there is 

provided a pair of oligonucleotides each independently of at least 17, at 
least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at 
least 40 bases specifically hybridizable with the isolated nucleic acid 
described herein in an opposite orientation so as to direct exponential 

35 amplification of a portion thereof in a nucleic acid amplification reaction, 
such as a polymerase chain reaction. The polymerase chain reaction and 
other nucleic acid amplification reactions are well known in the art and 
require no further description herein. The pair of oligonucleotides 
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according to this aspect of the present invention are preferably selected to 
have compatible melting temperatures (Tm), e.g., melting temperatures 
which differ by less than that 7 ^C, preferably less than 5 °C, more 
preferably less than 4 ^C, most preferably less than 3 ''C, ideally between 3 
5 ^ C and zero °C. Consequently, according to yet an additional aspect of 
the present invention there is provided a nucleic acid amplification product 
obtained using the pair of primers described herein. Such a nucleic acid 
amplification product can be isolated by gel electrophoresis or any other 
size based separation technique. Alternatively, such a nucleic acid 
10 amplification product can be isolated by affinity separation, either 
strandness affinity or sequence affinity. In addition, once isolated, such a 
product can be further genetically manipulated by restriction, ligation and 
the like, to serve any one of a plurality of applications associated with up 
and/or down regulation of activity. 
15 According to still an additional aspect of the present invention there 

is provided an antisense oligonucleotide comprising a polynucleotide or a 
polynucleotide analog of at least 10 bases, preferably between 10 and 15, 
more preferably between 50 and 20 bases, most preferably, at least 17, at 
least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at 
20 least 40 bases being hybridizable in vivo, under physiological conditions, 
with (i) a portion of a polynucleotide strand encoding a polypeptide at least 
60 %, preferably at least 65 %, more preferably at least 70 %, still 
preferably at least 75 %, yet preferably at least 80 %, more preferably at 
least 85 %, more preferably at least 90 %, most preferably at least 95 % - 
25 100 % homologous to SEQ ID NOs:3, 5, 7 or portions thereof as 
determined using the as determined using the Bestfit procedure of the 
DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 50, gap extension penalty - 3); or (ii) a portion of a 
30 polynucleotide strand at least 60 %, preferably at least 65 %, more 
preferably at least 70 %, still preferably at least 75 %, yet preferably at 
least 80 %, more preferably at least 85 %, more preferably at least 90 %, 
most preferably at least 95 % - 100 % identical with SEQ ID NOs:l, 4, 6 
or portions thereof as determined using the Bestfit procedure of the DNA 
35 sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin (gap creation penalty - 12, 
gap extension penalty - 4). 
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Such antisense oligonucleotides can be used to downregulate gene 
expression as further detailed hereinunder. Such an antisense 
oligonucleotide is readily synthesizable using solid phase oligonucleotide 
synthesis. 

5 The ability of chemically synthesizing oligonucleotides and analogs 

thereof having a selected predetermined sequence offers means for down 
modulating gene expression. Three types of gene expression modulation 
strategies may be considered. 

At the transcription level, antisense or sense oligonucleotides or 

10 analogs that bind to the genomic DNA by strand displacement or the 
formation of a triple helix, may prevent transcription. At the transcript 
level, antisense oligonucleotides or analogs that bind target mRNA 
molecules lead to the enzymatic cleavage of the hybrid by intracellular 
RNase H. In this case, by hybridizing to the targeted mRNA, the 

15 oligonucleotides or oligonucleotide analogs provide a duplex hybrid 
recognized and destroyed by the RNase H enzyme. Alternatively, such 
hybrid formation may lead to interference with correct splicing. As a 
result, in both cases, the number of the target mRNA intact transcripts 
ready for translation is reduced or eliminated. At the translation level, 

20 antisense oligonucleotides or analogs that bind target mRNA molecules 
prevent, by steric hindrance, binding of essential translation factors 
(ribosomes), to the target mRNA, a phenomenon known in the art as 
hybridization arrest, disabling the translation of such mRNAs. 

Thus, antisense sequences, which as described hereinabove may 

25 arrest the expression of any endogenous and/or exogenous gene depending 
on their specific sequence, attracted much attention by scientists and 
pharmacologists who were devoted at developing the antisense approach 
into a new pharmacological tool. 

For example, several antisense oligonucleotides have been shown to 

30 arrest hematopoietic cell proliferation, growth, entry into the S phase of 
the cell cycle, reduced survival and prevent receptor mediated responses. 

For efficient in vivo inhibition of gene expression using antisense 
oligonucleotides or analogs, the oligonucleotides or analogs must fulfill 
the following requirements (i) sufficient specificity in binding to the target 

35 sequence; (ii) solubility in water; (iii) stability against intra- and 
extracellular nucleases; (iv) capability of penetration through the cell 
membrane; and (v) when used to treat an organism, low toxicity. 
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Unmodified oligonucleotides are typically impractical for use as 
antisense sequences since they have short in vivo half-lives, during which 
they are degraded rapidly by nucleases. Furthermore, they are difficult to 
prepare in more than milligram quantities. In addition, such 
5 oligonucleotides are poor cell membrane penetraters. 

Thus it is apparent that in order to meet all the above listed 
requirements, oligonucleotide analogs need to be devised in a suitable 
manner. Therefore, an extensive search for modified oligonucleotides has 
been initiated. 

10 For example, problems arising in connection with double-stranded 

DNA (dsDNA) recognition through triple helix formation have been 
diminished by a clever "switch back" chemical linking, whereby a 
sequence of polypurine on one strand is recognized, and by "switching 
back", a homopurine sequence on the other strand can be recognized. 

15 Also, good helix formation has been obtained by using artificial bases, 
thereby improving binding conditions with regard to ionic strength and 
pH. 

In addition, in order to improve half-life as well as membrane 
penetration, a large number of variations in polynucleotide backbones 

20 have been done, nevertheless with little success. 

Oligonucleotides can be modified either in the base, the sugar or the 
phosphate moiety. These modifications include, for example, the use of 
methylphosphonates, monothiophosphates, dithiophosphates, 

phosphoramidates, phosphate esters, bridged phosphorothioates, bridged 

25 phosphoramidates, bridged methylenephosphonates, dephospho 
intemucleotide analogs with siloxane bridges, carbonate bridges, 
carboxymethyl ester bridges, carbonate bridges, carboxymethyl ester 
bridges, acetamide bridges, carbamate bridges, thioether bridges, sulfoxy 
bridges, sulfono bridges, various "plastic" DNAs, a-anomeric bridges and 

30 borane derivatives. 

International patent application WO 89/12060 discloses various 
building blocks for synthesizing oligonucleotide analogs, as well as 
oligonucleotide analogs formed by joining such building blocks in a 
defined sequence. The building blocks may be either "rigid" (i.e., 

35 containing a ring structure) or "flexible" (i.e., lacking a ring structure). In 
both cases, the building blocks contain a hydroxy group and a mercapto 
group, through which the building blocks are said to join to form 
oligonucleotide analogs. The linking moiety in the oligonucleotide 
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analogs is selected from the group consisting of sulfide (-S-), sulfoxide (- 
SO-), and sulfone (-SO2-). 

International patent application WO 92/20702 describe an acyclic 
oligonucleotide which includes a peptide backbone on which any selected 

5 chemical nucleobases or analogs are stringed and serve as coding 
characters as they do in natural DNA or RNA. These new compounds, 
known as peptide nucleic acids (PNAs), are not only more stable in cells 
than their natural counterparts, but also bind natural DNA and RNA 50 to 
100 times more tightly than the natural nucleic acids cling to each other. 

10 PNA oligomers can be synthesized from the four protected monomers 
containing thymine, cytosine, adenine and guanine by Merrifield solid- 
phase peptide synthesis. In order to increase solubility in water and to 
prevent aggregation, a lysine amide group is placed at the C-terminal 
region and may be pegylated, 

15 Thus, antisense technology requires pairing of messenger RNA 

with an oligonucleotide to form a double helix that inhibits translation. 
The concept of antisense-mediated gene therapy was already introduced in 
1978 for cancer therapy. This approach was based on certain genes that 
are crucial in cell division and grov^h of cancer cells. Synthetic fragments 

20 of genetic substance DNA can achieve this goal. Such molecules bind to 
the targeted gene molecules in RNA of tumor cells, thereby inhibiting the 
translation of the genes and resulting in dysfunctional growth of these 
cells. Other mechanisms has also been proposed. These strategies have 
been used, with some success in treatment of cancers, as well as other 

25 illnesses, including viral and other infectious diseases. Antisense 
oligonucleotides are typically synthesized in lengths of 13-30 nucleotides. 
The life span of oligonucleotide molecules in blood is rather short. Thus, 
they have to be chemically modified to prevent destruction by ubiquitous 
nucleases present in the body. Phosphorothioates are very widely used 

30 modification in antisense oligonucleotide ongoing clinical trials. A new 
generation of antisense molecules consist of hybrid antisense 
oligonucleotide with a central portion of synthetic DNA while four bases 
on each end have been modified with 2*0-methyl ribose to resemble RNA. 
In preclinical studies in laboratory animals, such compounds have 

35 demonstrated greater stability to metabolism in body tissues and an 
improved safety profile when compared with the first-generation 
unmodified phosphorothioate. Dosens of other nucleotide analogs have 
also been tested in antisense technology. 
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RNA oligonucleotides may also be used for antisense inhibition as 
they form a stable RNA-RNA duplex with the target, suggesting efficient 
inhibition. However, due to their low stability RNA oligonucleotides are 
typically expressed inside the cells using vectors designed for this purpose. 

5 This approach is favored when attempting to target a mRNA that encodes 
an abundant and long-lived protein. 

Recent scientific publications have validated the efficacy of 
antisense compounds in animal models of hepatitis, cancers, coronary 
artery restenosis and other diseases. The first antisense drug was recently 

10 approved by the FDA. This drug Fomivirsen, developed by Isis, is 
indicated for local treatment of cytomegalovirus in patients with AIDS 
who are intolerant of or have a contraindication to other treatments for 
CMV retinitis or who were insufficiently responsive to previous treatments 
for CMV retinitis (Pharmacotherapy News Network). 

15 Several antisense compounds are now in clinical trials in the United 

States. These include locally administered antivirals, systemic cancer 
therapeutics. Antisense therapeutics has the potential to treat many life- 
threatening diseases with a number of advantages over traditional drugs. 
Traditional drugs intervene after a disease-causing protein is formed. 

20 Antisense therapeutics, however, block mRNA transcription/translation 
and intervene before a protein is formed, and since antisense therapeutics 
target only one specific mRNA, they should be more effective with fewer 
side effects than current protein-inhibiting therapy. 

A second option for disrupting gene expression at the level of 

25 transcription uses synthetic oligonucleotides capable of hybridizing with 
double stranded DNA. A triple helix is formed. Such oligonucleotides 
may prevent binding of transcription factors to the gene's promoter and 
therefore inhibit transcription. Alternatively, they may prevent duplex 
unwinding and, therefore, transcription of genes within the triple helical 

30 structure. 

Thus, according to a fiirther aspect of the present invention there is 
provided a pharmaceutical composition comprising the antisense 
oligonucleotide described herein and a pharmaceutical ly acceptable 
carrier. The pharmaceutically acceptable carrier can be, for example, a 
35 liposome loaded with the antisense oligonucleotide. Formulations for 
topical administration may include, but are not limited to, lotions, 
ointments, gels, creams, suppositories, drops, liquids, sprays and powders. 
Conventional pharmaceutical carriers, aqueous, powder or oily bases. 
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thickeners and the like may be necessary or desirable. Compositions for 
oral administration include powders or granules, suspensions or solutions 
in water or non-aqueous media, sachets, capsules or tablets. Thickeners, 
diluents, flavorings, dispersing aids, emulsifiers or binders may be 
5 desirable. Formulations for parenteral administration may include, but are 
not limited to, sterile aqueous solutions which may also contain buffers, 
diluents and other suitable additives. 

According to still a further aspect of the present invention there is 
provided a ribozyme comprising the antisense oligonucleotide described 
10 herein and a ribozyme sequence fused thereto. Such a ribozyme is readily 
synthesizable using solid phase oligonucleotide synthesis. 

Ribozymes are being increasingly used for the sequence-specific 
inhibition of gene expression by the cleavage of mRNAs encoding 
proteins of interest. The possibility of designing ribozymes to cleave any 
15 specific target RNA has rendered them valuable tools in both basic 
research and therapeutic applications. In the therapeutics area, ribozymes 
have been exploited to target viral RNAs in infectious diseases, dominant 
oncogenes in cancers and specific somatic mutations in genetic disorders. 
Most notably, several ribozyme gene therapy protocols for HIV patients 
20 are already in Phase 1 trials. More recently, ribozymes have been used for 
transgenic animal research, gene target validation and pathway elucidation. 
Several ribozymes are in various stages of clinical trials. ANGIOZYME 
was the first chemically synthesized ribozyme to be studied in human 
clinical trials. ANGIOZYME specifically inhibits formation of the VEGF- 
25 r (Vascular Endothelial Growth Factor receptor), a key component in the 
angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other 
firms have demonstrated the importance of anti-angiogenesis therapeutics 
in animal models. HEPTAZYME, a ribozyme designed to selectively 
destroy Hepatitis C Virus (HCV) RNA, was found effective in decreasing 
30 Hepatitis C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, 
Incorporated - WEB home page). 

According to still another aspect of the present invention there is 
provided an antibody comprising an immunoglobulin specifically 
recognizing and binding a polypeptide at least 60 %, preferably at least 65 
35 %, more preferably at least 70 %, still preferably at least 75 %, yet 
preferably at least 80 %, more preferably at least 85 %, more preferably at 
least 90 %, most preferably at least 95 % - 100 % homologous (identical + 
similar) to SEQ ID NOs:3, 5, 7 or portions thereof using as determined 
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using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). According to a preferred embodiment of this aspect of the present 

5 invention the antibody specifically recognizing and binding the 
polypeptides set forth in SEQ ID NOs:3, 5, 7 or portions thereof 

The present invention can utilize serum immunoglobulins, 
polyclonal antibodies or fragments thereof, (i.e., immunoreactive 
derivative of an antibody), or monoclonal antibodies or fragments thereof. 

10 Monoclonal antibodies or purified fragments of the monoclonal antibodies 
having at least a portion of an antigen binding region, including such as 
Fv, F(abl)2, Fab fragments (Harlow and Lane, 1988 Antibody, Cold 
Spring Harbor), single chain antibodies (U.S. Patent 4,946,778), chimeric 
or humanized antibodies and complementarily determining regions (CDR) 

15 may be prepared by conventional procedures. Purification of these serum 
immunoglobulins antibodies or fragments can be accomplished by a 
variety of methods known to those of skill including, precipitation by 
ammonium sulfate or sodium sulfate followed by dialysis against saline, 
ion exchange chromatography, affinity or immunoaffinity chromatography 

20 as well as gel filtration, zone electrophoresis, etc. (see Goding in. 
Monoclonal Antibodies: Principles and Practice, 2nd ed., pp. 104-126, 
1986, Orlando, Fla., Academic Press). Under normal physiological 
conditions antibodies are found in plasma and other body fluids and in the 
membrane of certain cells and are produced by lymphocytes of the type 

25 denoted B cells or their functional equivalent. Antibodies of the IgG class 
are made up of four polypeptide chains linked together by disulfide bonds. 
The four chains of intact IgG molecules are two identical heavy chains 
referred to as H-chains and two identical light chains referred to as L- 
chains. Additional classes includes IgD, IgE, IgA, IgM and related 

30 proteins. 

Methods for the generation and selection of monoclonal antibodies 
are well known in the art, as summarized for example in reviews such as 
Tramontano and Schloeder, Methods in Enzymology 178, 551-568, 1989. 
A recombinant protein of the present invention may be used to generate 
35 antibodies in vitro. More preferably, the recombinant protein of the 
present invention is used to elicit antibodies in vivo. In general, a suitable 
host animal is immunized with the recombinant protein of the present 
invention. Advantageously, the animal host used is a mouse of an inbred 
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Strain. Animals are typically immunized with a mixture comprising a 
solution of the recombinant protein of the present invention in a 
physiologically acceptable vehicle, and any suitable adjuvant, which 
achieves an enhanced immune response to the immunogen. By way of 

5 example, the primary immunization conveniently may be accomplished 
with a mixture of a solution of the recombinant protein of the present 
invention and Freund's complete adjuvant, said mixture being prepared in 
the form of a water in oil emulsion. Typically the immunization may be 
administered to the animals intramuscularly, intradermally, 

10 subcutaneously, intraperitoneally, into the footpads, or by any appropriate 
route of administration. The immunization schedule of the immunogen 
may be adapted as required, but customarily involves several subsequent 
or secondary immunizations using a milder adjuvant such as Freund's 
incomplete adjuvant. Antibody titers and specificity of binding to the 

15 recombinant protein can be determined during the immunization schedule 
by any convenient method including by way of example 
radioimmunoassay, or enzyme linked immunosorbant assay, which is 
known as the ELISA assay. When suitable antibody titers are achieved, 
antibody producing lymphocytes from the immunized animals are 

20 obtained, and these are cultured, selected and cloned, as is known in the 
art. Typically, lymphocytes may be obtained in large numbers from the 
spleens of immunized animals, but they may also be retrieved from the 
circulation, the lymph nodes or other lymphoid organs. Lymphocytes are 
then fused with any suitable myeloma cell line, to yield hybridomas, as is 

25 well known in the art. Alternatively, lymphocytes may also be stimulated 
to grow in culture, and may be immortalized by methods known in the art 
including the exposure of these lymphocytes to a virus, a chemical or a 
nucleic acid such as an oncogene, according to established protocols. 
After fusion, the hybridomas are cultured under suitable culture 

30 conditions, for example in multiwell plates, and the culture supematants 
are screened to identify cultures containing antibodies that recognize the 
hapten of choice. Hybridomas that secrete antibodies that recognize the 
recombinant protein of the present invention are cloned by limiting 
dilution and expanded, under appropriate culture conditions. Monoclonal 

35 antibodies are purified and characterized in terms of immunoglobulin type 
and binding affinity. 
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Additional objects, advantages, and novel features of the present 
invention will become apparent to one ordinarily skilled in the art upon 
examination of the following examples, which are not intended to be 
limiting. Additionally, each of the various embodiments and aspects of the 
present invention as delineated hereinabove and as claimed in the claims 
section below finds experimental support in the following examples. 

EXAMPLES 

Reference is now made to the following examples, which together 
with the above descriptions, illustrate the invention in a non limiting 
fashion. 

Generally, the nomenclature used herein and the laboratory 
procedures in recombinant DNA technology described below are those 
well known and commonly employed in the art. Standard techniques are 
used for cloning, DNA and RNA isolation, amplification and purification. 
Generally enzymatic reactions involving DNA ligase, DNA polymerase, 
restriction endonucleases and the like are performed according to the 
manufacturers* specifications. These techniques and various other 
techniques are generally performed according to Sambrook et al., 
molecular Cloning - A Laboratory Manual, Cold Spring Harbor 
Laboratory, Cold Spring Harbor, N.Y. (1989), which is incorporated 
herein by reference. Other general references are provided throughout this 
document. The procedures therein are believed to be well known in the art 
and are provided for the convenience of the reader. All the information 
contained therein is incorporated herein by reference. 

Materials and Experimental Methods 

The following protocols and experimental details are referenced in 
the Examples that follow: 

Primers list: 



hnllll6 


5*-GGAGAGCAAGTCTGTGTTGATTC-3' 


(SEQ ID NO: 10) 


hn 11230 


5'-CACTGGTAGCCATGAGTGTGAG-3* 


(SEQlDNOiIl) 


hiilu350 


5*-TTGGTCATCCCTCCAGTCACCA-3* 


(SEQ ID NO: 12) 


pn9-312u 


5*-CTTGCCTGTAGACAGAGCTGCAG-3' 


(SEQ ID NO: 14) 


hpu-685 


5'-GAGCAGCCAGGTGAGCCCAAGA-3' 


(SEQ ID NO: 16) 


hpl967 


5'-TCAGATGCAAGCAGCAACTTTGGC-3' 


(SEQ ID NO: 17) 


mnlullS 


5*-CACCCTGATGTCATGCTGGAG-3' 


(SEQ ID NO: 18) 


mn 11563 


5'-CATCTAGGAGAGCAATGACGTTC-3' 


(SEQ ID NO: 19) 
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Apl 5'-CCATCCTAATACGACTCACTATAGGGC-3' (SEQ ID NO:20) 

Ap2 5*-ACTCACTATAGGGCTCGAGCGGC-3' (SEQ ID NO:2 1 ) 

Southern analysis: 

Genomic DNA was extracted from animal or from human blood 
using Blood and cell culture DNA maxi kit (Qiagene). DNA was digested 
with iScoRI, separated by gel electrophoresis and transferred to a nylon 
membrane Hybond N+ (Amersham). PGR products underwent a similar 
procedure. Hybridization was performed at 68"* G in 6 x SSC, 1 % SDS, 5 
X Denharts, 10 % dextran sulfate, 100 |ig/ml salmon sperm DNA, and 32p 
labeled probe. Pn9, a 1.7 kb fragment, which contain the entire open 
reading frame except for a deletion of 162 nucleotides (del:473-634, SEQ 
ID NO:l) was used as a probe. Following hybridization, the membrane 
was washed with 3 x SSC, 0.1 % SDS, at 68 °C and exposed to X-ray film 
for 3 days. Membranes were then washed with 0.1 x SSC, 0.1 % SDS, at 
68 **C and were re-exposed for 4 days. 

RT'PCR: 

RNA was prepared using TRI-Reagent (Molecular research center 
Inc.) according to the manufacturer instructions. 1.25 ^g were taken for 
reverse transcription reaction using SuperScriptll Reverse transcriptase 
(Gibco BRL) and Oligo (dT)i5 primer (SEQ ID NO:22), (Promega). 
Amplification of the resultant first strand cDNA was performed with Tag 
polymerase (Promega) or with Expand high fidelity (Boehringer 
Mannheim). 

cDNA Sequence analysis: 

Sequence determinations were performed with vector specific and 
gene specific primers, using an automated DNA sequencer (Applied 
Biosystems, model 373A). Each nucleotide was read from at least two 
independent primers. Computation and sequence analysis and alignments 
were done using the DNA sequence analysis software package developed 
by the Genetic Computer Group (GCG) at the university of Wisconsin. 
Alignments of two sequences were performed using Bcstfit (gap creation 
penalty - 12, gap extension penalty - 4) or with Gap program (gap creation 
penalty - 50, gap extension penalty - 3). 

Tissue distribution: 

Tissue distribution of the hnhpl transcript was determined by semi- 
quantitative PGR. cDNA panels were obtained from Clontech. PGR was 
performed with the gene specific primers hnlu350 (SEQ ID NO: 12) and 
hnllll6 (SEQ ID NO:10). PGR program was as follows: 94 ^'G, 3 
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minutes, followed by 40 cycles of 94 ^C, 45 seconds, 64 ''C, 1 minute, 72 
°C, 1 minute. Samples were taken for further analysis following 25, 30, 
35 and 40 cycles. 

Chromosome localization: 

5 Chromosome localization of hnhpl was performed using the 

radiation hybrid panel Stanford G3. This panel was provided by the 
human genome center at the Weizmann Institute. A 225 bp genomic 
fragment of hnhpl gene was amplified using the gene specific primers 
hnlu350 (SEQ ID NO:12) and hnlll 16 (SEQ ID NO:10), PGR program 

10 was as follows: 94 ""C, 3 minutes, followed by 39 cycles of 94 °C 45 
seconds, 64 ^C, 1 minute, 72 °C, 1 min. Analysis of results was done 
through the RH server at the Stanford human genome center. 

EXAMPLE 1 

1 5 Cloning an EST for a novel heparanase gene 

The entire amino acid sequence of human heparanase (SEQ ID 
NO:9) was used to screen human EST database for homologous 
sequences. Screening was performed using the BLAST 2.0 server at the 
NCBI, basic BLAST search, tblastn program. 

20 A distantly homologous fragment was pooled out, accession 

number AI222323, IMAGE clone number 1843155 from 
Soares__NFL_T_GBC_Sl Homo Sapiens cDNA library prepared from 
testis B-cells and fetal lungs. The search values for this sequence were as 
follows: Score = 38.3 bits (87), Expect = 0.15 Identities = 16/36 (44 %), 

25 Positives = 22/36 (60 %). The sequence of accession number AI222323 
contains 378 nucleotides of the V of clone 1843155 (complementary to 
nucleotides 165-543 of SEQ ID NO:23). 

This clone was purchased from the IMAGE consortium. It 
contained an insert of 560 bp (SEQ ID NO:23). The entire nucleotide 

30 sequence was determined and compared to the hpa cDNA encoding 
human heparanase. The homology between clone 1843 155 and hpa cDNA 
was restricted to the 3* region of the cDNA clone. There was 59 % 
homology between nucleotides 99-275 of clone 1843155 (SEQ ID 
NO:23), and 1532-1708 oihpa (SEQ ID NO:24). The deduced amino acid 
35 sequence of this region had 60 % homology (identical -I- similar) to amino 
acids 488-542 (SEQ ID NO:9) of human heparanase. The downstream 
sequence (nucleotides 276-560, SEQ ID NO:23) represents a 3' 
untranslated region and a poly A tail. The upstream sequence, nucleotides 
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1-98 (SEQ ID NO:23) was unrelated to heparanase. This unrelated 
sequence was found to be identical to a different cDNA clone from the 
same library. Therefore, the human EST clone 1843155, obtained from 
the IMAGE consortium is assumed to be a chimera, which contains two 
5 unrelated partial cDNAs ligated to a single vector. 

EXAMPLE 2 
Cloning a cDNA for a novel heparanase gene 
In order to isolate the entire cDNA, three primers were designed 
10 according to the sequence of clone 1843155. The cDNA was amplified 
from placenta cDNA by Marathon RACE (rapid amplification of cDNA 
ends) (Clontech, Palo Alto, California) according to the manufacturer 
instructions. The first cycle was performed with the gene specific primer 
hnlll 16 (SEQ ID NO:10) and the universal primer Apl (SEQ ID NO:20). 
15 The second cycle was performed with the gene specific primer hnll230 
(SEQ ID NO:ll) and the universal primer Ap2 (SEQ ID NO:2l). 
Following amplification, a difused band of approximately 1.7 kb was 
obtained. This cDNA amplification product was subcloned into pGEM T- 
easy (Promega, Madison, WI) and the nucleotide sequences of three 
20 independent clones pn5, pn6 and pn9 were determined. The consensus 
sequence of the longest cDNA, pn6, appears in Figure 1 (SEQ ID NOs:l, 2 
and 3). It is 2060 nucleotide long and it contains an open reading frame of 
1776 nucleotides, which encodes a polypeptide of 592 amino acids, with a 
calculated molecular weight of 66.5 kDa. The newly cloned gene was 
25 designated hnhpL The two shorter forms, pn9 and pn5 and their deduced 
amino acid sequences are set forth in SEQ ID NOs:4 and 6 and SEQ ID 
NO:5 and 7, respectively. Pn9 and pn5 were identical to pn6, however 
each one of then contained an in frame deletion as a result of alternative 
splicing. Pn9 contains a deletion of 162 nucleotides, 473-634 of SEQ ID 
30 NO:l, which correspond to amino acids 150-203 of SEQ ID NO:3. As a 
result pn9 encodes a putative polypeptide of 538 amino acids (SEQ ID 
NO:5) having a calculated molecular weight of 60,4 kDa. Pn5 contains a 
deletion of 336 nucleotides, 473-808 of SEQ ID NO:l, which correspond 
to amino acids 150-261 of SEQ ID NO:3, thus, it encodes a putative 
35 polypeptides of 480 amino acids (SEQ ID NO:7) having a calculated 
molecular weight of 53.9 kDa. The 11^^ amino acid residue of SEQ ID 
NO:3 is methionine. It is generally accepted that the first methionine 
serves as a translation start site in mammals, however, the nucleotides 
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surrounding the second ATG fit better with the Kozak consensus sequence 
for translation start site. Translation may thus start at the second 
methionine and produce a protein of 581 amino acids with calculated 
molecular weight of 65.4 kDa. The presence of transcripts of variable 

5 length was confirmed by PGR amplification of the hnlhp cDNA using two 
gene specific primers: pn9-312u (SEQ ID NO: 14) which is located close 
to the 5^ end and hnll230 (SEQ ID NO:l 1) which overlaps the stop codon 
at the 3* end of the open reading frame. Amplification was performed 
from Marathon ready cDNA prepared from placenta and from testis. The 

10 PGR products are shown in figure 3. Four bands were obtained from 
placenta: two major bands of 1.45 and 1.6 kb, similar to pn9 and pn6 and 
two minor bands, one of 1.35 kb, similar to pn5 and a second one of 1.8 
kb. The sequence of the latter has not yet been determined. Amplification 
of testis cDNA resulted in a different pattern. Four bands of 1.35, 1.65, 

15 1.85 and 2.05 kb were observed and a minor one of 1.5 kb. The various 
forms appear to represent products of alternative splicing. Since the 
deletions characterized so far retain an open reading frame, the translation 
products of the various cDNAs may constitute a protein family. The 
comparison between the amino acid sequence of hnhpl and heparanase is 

20 shown in Figure 3. Using the gap program of the GCG package which 
aligns the entire amino acid sequences, the homology between the two 
proteins is 45.5 % identity and 7.3 % similarity, total homology of 52.8 % 
(gap creation penalty - 50, gap extension penalty - 3). The BestFit 
program defines the region of the best homology between the two 

25 sequences. Using this program, the homology between the two amino acid 
sequences starts at position 63 of hnlhp 1 (SEQ ID NO:3) and position 41 
of heparanase (SEQ ID NO:9) and is 47.5 % identity and 7.8 % similarity, 
i.e. homology of 55.3 %. The homology between the nucleotide sequences 
of hnhpl and hpa is 57 % as calculated by the BestFit program. The 

30 homologous region is located between nucleotides 638-1812 of hnhpl 
(SEQ ID NO:l) and nucleotides 564-1708 of hpa (SEQ ID NO:24). Using 
the Gap program the homology is 51 % over the entire sequence gap 
creation penalty - 50, gap extension penalty - 3. 

35 EXAMPLE 3 

Zoo blot 

Hnhpl cDNA was used as a probe to detect homologous sequences 
in human DNA and in DNA of various animals. The autoradiogram of the 
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Southern analysis is presented in Figure 4. Several bands were detected in 
human DNA. Several intense bands were detected in all mammals, while 
faint bands were detected in chicken. This correlates with the 
phylogenetic relation between human and the tested animals. The intense 

5 bands indicate that hnhpl is conserved among mammals as well as in more 
genetically distant organisms. The multiple bands patterns suggest that in 
all animals, hnhpl locus occupies a large genomic region. Several specific 
bands disappeared after stringent wash. These may represent homologous 
sequences and suggest the existence of a gene family, which can be 

10 isolated based on their homology to the human hnhpl reported here. 

EXAMPLE 4 
comparison to heparanase via cross hybridization 
In order to check the capability of hpa and hnhpl to cross 

15 hybridize under low stringency conditions, the entire coding region of the 
human hpa and hnhpl were amplified by PGR. Human hpa was amplified 
from platelets mRNA by RT-PCR using the primers hpu-685 (SEQ ID 
NO: 16) and hpl967 (SEQ ID NO: 17), and hnhpl was amplified from testis 
using the primers hnll230 (SEQ ID NO:ll) and pn9-312u (SEQ ID 

20 NO: 14). The products were quantified and samples of 100 pg and I ng 
were run on agarose gel and subjected to Southem hybridization. The 
membranes were probed with ^^p labeled hpa cDNA and with hnhpl 
cDNA. No cross hybridization was observed (Figure 5) even after over 
exposure for 5 days. Since hpa is the most similar sequence known today 

25 to that of hnhpl ^ this experiment indicates that the bands detected in the 
autoradiograph of Figure 4 are of the hnhpl gene or of yet unknown 
sequences homologous thereto, which might constitute a gene family. 
This fiirther indicated that such sequences are isolatable using the hnhpl 
as a probe to screen the relevant libraries, or using hnhpl derived PGR 

30 primers to amplify the relevant cDNA or DNA sequences. 

EXAMPLE 5 
Chromosome localization 

The chromosome localization of hnhpl was determined using G3 
35 radiation hybrid panel. Hnhpl was amplified from 83 human/mouse 
radiation hybrids. The results were analyzed by the RH server and the 
hnhpl gene was mapped to chromosome 10, next to the marker SHGC- 
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57721. The results also indicated a possibility of a second copy of the 
gene. 

EXAMPLE 6 
Expression Pattern ofhnhpl 
5 The tissue distribution of hnhpl transcripts was determined using 

calibrated human cDNA panels (Clontech, Palo Alto, Ca). The results are 
shown in Table 1 below. Expression level is generally low. PCR products 
were clearly observed only after 40 cycles of amplification. 

10 TABLE 1 

Tissue 

Bone marrow 
Liver 

15 Lymph node 

Leukocytes 
Spleen 
Thymus 
Tonsil 

20 Colon 

Ovary 
Prostate 
Small intestine 
Testis 

25 

EXAMPLE 7 
cloning of a Mouse homologue 
Screening of the mouse EST database with the amino acid sequence 
of heparanase as well as of hnhpl pooled out a mouse EST clone, which 
30 shares distant homology with heparanase and a remarkably high homology 
with hnhpL The EST clone 1378452 accession number AIO 19269 from 
mouse thymus was 35 1 nucleotide long and it is set forth in SEQ ID NO:8. 
It has 61-63 % identity over 161 nucleotides (191-351, SEQ ID NO:8) to 
the human (SEQ ID NO:24) and mouse (SEQ ID NO: 15) hpa nucleotide 
35 sequences, and 93 % to hnhpl nucleotide sequence (SEQ ID NO:l) using 
the BestFit program of the GCG package. The nucleotide sequence of this 
clone did not contain an open reading frame. Two frame shifts were 
identified in the sequence found in the EST database, as compared to the 
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+ 



wo 01/00643 



PCT/ILOO/00358 



37 

hnhpl sequence. This frame shifts were later confirmed by nucleotide 
sequence analysis of this clone as well as by isolation of this fragment 
from BL6 mouse melanoma cells and determination of its nucleotide 
sequence. This mouse gene is transcribed at very low levels. Low levels 

5 of expression were indicated as no amplification products were obtained 
following 40 cycles of PCR from mouse cDNA panel (Clontech, Palo 
Alto, Ca) which included cDNA from mouse heart, brain, spleen, lung, 
liver, skeletal muscle, kidney, testis and embryos of 7, 11,15, and 17 days. 
The amplification was performed using the gene specific primers mnlul 18 

10 (SEQIDNO:18)andmnll563 (SEQ IDNO:19). 

EXAMPLE 8 
Expression of hnhpl in mammalian cells 
A mammalian expression vector was constructed in order to over- 
15 express hnhpl in human cells. To enable detection of the Hnhpl 
translation product, the hnhpl expression vector was designed to encode a 
C-terminal tagged hnl protein. A DNA sequence, which encodes eight 
amino acids FLAG (Kodak), was fused to the 3' end of the hnhpl open 
reading frame. 

20 Fusion of the FLAG sequence to the hnhpl coding sequence was 

generated by PCR amplification using the primer: hnl-c-flag: 5'- 

A-3' (SEQ IDNO:25) and the primer: pn9-312u (SEQ IDNO:14). The 
PCR program was as follows: 94 ""C, 3 min followed by 5 cycles of : 94 

25 ^'C, 45 seconds, 50 ^'C, 45 seconds and 72 ""C, 2 minutes, and then 32 
cycles of 94 ''C, 45 seconds, 64 ""C, 45 seconds and 72 ""C, 2 min. 

The amplification product was subcloned into pGEM-T-easy, and 
the sequence was verified. The resulting plasmids were designated pGEM- 
pn6F and pGEM-pn9F. 

30 Two constructs were generated in pSI mammalian expression 

vector (Promega): the first contained the complete hnhpl sequence (pn6) 
and the second contained the alternative splice form (pn9). The pSI-pn6 
expression vector was constructed by triple ligation of the following 
fragments: an EcoRI - BamHI fragment, which contains the 5' end of hnl - 

35 pn6, excised from pGem-T-easy-pn9, a BamHI - NotI fragment which 
contains the 3' FLAG tagged hnhpl, excised from pGEM-pn6F and pSI 
digested with EcoRI -Notl. 
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The pSI-pn9 expression vector was constructed similarly, by triple 
ligation of the following fragments: an EcoRI - Sspl fragment, which 
contains the 5' end of hnhpl-pn6, excised from pGem-T-~easy-pn9, an 
Sspl -NotI fragment, which contains the 3' FLAG tagged hnhpl, excised 

5 from pGem-pn6F and pSI digested with EcoR I - Not I. 

The resulting plasmids were transfected into human embryonal 
kidney 293 cells, using the Fugene transfection reagent (Boehringer 
Mannheim). Forty-eight hours following transfection cells were harvested 
and proteins were analysed by western blot. Cell lysates of 2.5x10^ were 

10 separated by SDS-PAGE, transferred onto a nylon membrane and 
incubated with anti FLAG antibody 1:1000 dilution (Kodak anti FLAG 
M2 cat: IB 13025, final concentration 10 pig/ml). Proteins of 
approximately 65 kDa and 60 kDa were detected in cells transfected with 
pSI-pn6F and pSI-pn9F respectively. These proteins are similar in size to 

15 those predicted by the calculated molecular weight for the translation 
products of corresponding open reading frames. It is demonstrated that 
both the entire hnhpl cDNA and the pn9 splice form are successfully 
transcribed and translated in human 293 cells. However, unlike 
heparanase the Hnhpl protein products do not undergo major processing 

20 in these cells. 

Although the invention has been described in conjunction with 
specific embodiments thereof, it is evident that many alternatives, 
modifications and variations will be apparent to those skilled in the art. 
Accordingly, it is intended to embrace all such alternatives, modifications 

25 and variations that fall within the spirit and broad scope of the appended 
claims. All publications cited herein are incorporated by reference in their 
entirety. 
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WHAT IS CLAIMED IS: 

1. An isolated nucleic acid comprising a polynucleotide 
hybridizable with SEQ ID NOs:l, 4, 6 or portions thereof at 68 in 6 x 
SSC, 1 % SDS, 5 X Denharts, 10 % dextran sulfate, 100 \xg/m\ salmon 
sperm DNA, and 32p labeled probe and wash at 68 °C with 3 x SSC and 
0-1 %SDS. 

2. An isolated nucleic acid comprising a polynucleotide at least 
60 % identical with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

3. The isolated nucleic acid of claim 2, wherein said 
polynucleotide is as set forth in SEQ ID NOs:l, 4, 6 or portions thereof. 

4. An isolated nucleic acid comprising a polynucleotide 
encoding a polypeptide being at least 60 % homologous with SEQ ID 
NOs:3, 5, 7 or portions thereof as determined using the Bestfit procedure 
of the DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 50, gap extension penalty - 3). 

5. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 1 . 

6. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 2. 

7. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 3. 

8. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 4. 
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9. A recombinant protein comprising a polypeptide at least 60 
% homologous with SEQ ID NOs:3, 5, 7 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

10. The recombinant protein of claim 9, wherein said 
polypeptide is as set fourth in SEQ ID NOs:3, 5, 7 or portions thereof 

11. A nucleic acid construct comprising the isolated nucleic acid 
of claim 1 . 



12. A nucleic acid constmct comprising the isolated nucleic acid 
of claim 2. 



13. A nucleic acid constmct comprising the isolated nucleic acid 
of claim 3. 



14. A nucleic acid construct comprising the isolated nucleic acid 
of claim 4. 



15. A host cell comprising the nucleic acid constmct of claim 

11. 



16. A host cell comprising the nucleic acid construct of claim 

12. 



17. A host cell comprising the nucleic acid construct of claim 

13. 



18. A host cell comprising the nucleic acid constmct of claim 

14. 



19. An antisense oligonucleotide comprising a polynucleotide or 
a polynucleotide analog of at least 10 bases being hybridizable in vivo, 
under physiological conditions, with: 
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(i) a portion of a polynucleotide strand encoding a polypeptide 
at least 60 % homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of 
the DNA sequence analysis software package developed by 
the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 
3); or 

(ii) a portion of a polynucleotide strand at least 60 % identical 
with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis 
software package developed by the Genetic Computer Group 
(GCG) at the university of Wisconsin (gap creation penalty - 
50, gap extension penalty - 3), 

20. A ribozyme comprising the antisense oligonucleotide of 
claim 19 and a ribozyme sequence. 

21. An antisense nucleic acid construct comprising a promoter 
sequence and a polynucleotide sequence directing the synthesis of an 
antisense RNA sequence of at least 10 bases being hybridizable in vivo, 
under physiological conditions, with: 

(i) a portion of a polynucleotide strand encoding a polypeptide 
at least 60 % homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of 
the DNA sequence analysis software package developed by 
the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 
3); or 

(ii) a portion of a polynucleotide strand at least 60 % identical 
with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis 
software package developed by the Genetic Computer Group 
(GCG) at the university of Wisconsin (gap creation penalty - 
50, gap extension penalty - 3). 
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CGCTTAATTCTAGAAGAGGGATTGA 

ATGAGGGTGCTTTGTGCCTTCCCTGAAGCCATGCCCTCCAGCAACTCCCGCCCCCCCGCG 
MRVLCAFPEAMPSSNSRPPA 

TGCCTAGCCCCGGGGGCTCTCTACTTGGCTCTGTTGCTCCATCTCTCCCTTTCCTCCCAG 
CLAPGALYLALLLHLSLSSQ 

GCTGGAGACAGGAGACCCTTGCCTGTAGACAGAGCTGCAGGTTTGAAGGAAAAGACCCTG 
AGDRRPLPVDRAAGLKEKTL 

ATTCTACTTGATGTGAGCACCAAGAACCCAGTCAGGACAGTCAATGAGAACTTCCTCTCT 
ILLDVSTKNPVRTVNENFLS 

CTGCAGCTGGATCCGTCCATCATTCATGATGGCTGGCTCGATTTCCTAAGCTCCAAGCGC 
LQLDPSI I HDGWLDFLSSKR 

TTGGTGACCCTGGCCCGGGGACTTTCGCCCGCCTTTCTGCGCTTCGGGGGCAAAAGGACC 
LVTLARGLSPAFLRFGGKRT 

GACTTCCTGCAGTTCCAGAACCTGAGGAACCCGGCGAAAAGCCGCGGGGGCCCGGGCCCG 
DFLQFQNLRNPAKSRGGPGP 

GATTACTATCTCAAAAACTATGAGGATGACATTGTTCGAAGTGATGTTGCCTTAGATAAA 
DYYLKNYEDDIVRSDVALDK 

CAGAAAGGCTGCAAGATTGCCCAGCACCCTGATGTTATGCTGGAGCTCCAAAGGGAG7>AG 
QKGCKIAQHPDVMLELQREK 

GCAGCTCAGATGCATCTGGTTCTTCTAAAGGAGCAATTCTCCAATACTTACAGTAATCTC 
AAQMHLVLLKEQFSNTYSNL 

ATATTAACAGCCAGGTCTCTAGACAAACTTTATAACTTTGCTGATTGCTCTGGACTCCAC 
I LTARSLDKLYNFADCSGLH 

CTGATATTTGCTCTAAATGCACTGCGTCGTAATCCCAATAACTCCTGGAACAGTTCTAGT 
LI FALNALRRNPNNSWNSSS 

GCCCTGAGTCTGTTGAAGTACAGCGCCAGCAAAAAGTACAACATTTCTTGGGAACTGGGT 
ALSLLKYSASKKYNISWELG 

AATGAGCCAAATAACTATCGGACCATGCATGGCCGGGCAGTAAATGGCAGCCAGTTGGGA 
NEPNNYRTMHGRAVNGSQLG 

AAGGATTACATCCAGCTGAAGAGCCTGTTGCAGCCCATCCGGATTTATTCCAGAGCCAGC 
KDYIQLKSLLQPIRIYSRAS 

TTATATGGCCCTAATATTGGGCGGCCGAGGAAGAATGTCATCGCCCTCCTAGATGGATTC 
LYGPNXGRPRKNVIALLDGF 

ATGAAGGTGGCAGGAAGTACAGTAGATGCAGTTACCTGGCAACAT T GCTACATTGATGGC 
MKVAGSTVDAVTWQHCY I DG 

CGGGTGGTCAAGGTGATGGACTTCCTGAAAACTCGCCTGTTAGACACACTCTCTGACCAG 
RVVKVMDFLKTRLLDTLSDQ 

ATTAGGAAAATTCAGAAAGTGGTTAATACATACACTCCAGGAAAGAAGATTTGGCTTGAA 
IRKIQKVVNTYTPGKKIWLE 

GGTGTGGTGACCACCTCAGCTGGAGGCACAAACAATCTATCCGATTCCTATGCTGCAGGA 
GVVTTSAGGTNN LSDSYAAG 

TTCTTATGGTTGAACACTTTAGGAATGCTGGCCAATCAGGGCATTGATGTCGTGATACGG 
FLWLNTLGMLANQGIDVVIR 

CACTCATTTTTTGACCATGGATACAATCACCTCGTGGACCAGAATTTTAACCCATTACCA 
HSFFDHGYNHLVDQNFNPLP 

GACTACTGGCTCTCTCTCCTCTACAAGCGCCTGATCGGCCCCAAAGTCTTGGCTGTGCAT 
DYWLSLLYKRLIGPKVLAVH 

GTGGCTGGGCTCCAGCGGAAGCCACGGCCTGGCCGAGTGATCCGGGACAAACTAAGGATT 
VAGLQRKPRPGRVIRDKLRI 
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ATCATCAACTTGCATCGATCAAGAAAGAAAATCAAGCTGGCTGGGACTCTCAGAGACAAG 1585 
IINLHRSRKKIKLAGTLRDK 

CTGGTTCACCAGTACCTGCTGCAGCCCTATGGGCAGGAGGGCCTAAAGTCX:AAGTCAGTG 164 5 
LVHQYLLQPYGQEGLKSKSV 

CAACTGAATGGCCAGCCCTTAGTGATGGTGGACGACGGGACCCTCCCAGAATTGAAGCCC 1705 
QLNGQPLVMVDDGTLPELKP 

CGCCCCCTTCGGGCCGGCCGGACTVTTGGTCATCCCTCCAGTCACCATGGGCTTTTTTGTG 1765 
RPLRAGRTLVIPPVTMGFFV 

GTCAAGAATGTCAATGCTTTGGCCTGCCGCTACCGATAAGCTATCCTCACACTCATGGCT 1825 
VKNVNALACRYR* 

ACCAGTGGGCCTGCTGGGCTGCTTCCACTCCTCCACTCCAGTAGTATCCTCTGTTTTCAG 1885 

ACATCCTAGCAACCAGCCCCTGCTGCCCCATCCTGCTGGAATCAACACAGACTTGCTCTC 194 5 

CAAAGAGACTAAATGTCATAGCGTGATCTTAGCCTAGGTAGGCCACATCCATCCCAAAGG 2005 

AAAATGTAGACATCACCTGTACCTATATAAGGATAAAGGCATGTGTATAGAGCAA 2060 
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1 MRVLCAFPEAMPSSNSRPPACIiAPGALYIALLLHLSLSSQAGDRRPLPVD 50 

I I I I 1 II. 

1 MLLRSKPALPPPLMLLLLGPLGPLSPGALP 30 

51 RAAGLKEKTLILLDVSTKNPVRTVNENFLSLQLDPSIIHD.GWLDFLSSK 99 

II . . . : I I I . I . I . . i I I . : I . : I . I II 
31 RPA. .QAQDWDLDFFTQEPLHLVSPSFLSVTIDANLATDPRFLILLGSP 78 

100 RLVTLARGLSPAFLRFGGKRTDFLQFQNLRNPAKSRGGPGPDYYLKNYED 149 

: I 111111111:11111 : II II I * I I I ' 

79 KLRTLARGLSPAYLRFGGTKTDFLIF DPKKESTFEERSYWQSQVNQ 124 

• • • • 

150 DIVRSDVALDKQKGCKIAQHPDVMLELQREKAAQMHLVIiLKEQFSNTYSN 199 

II II I . I t . . M : I : : 1 

125 DI CKYGSIPPDVEEKLRLEWPYQEQLLLREHYQKKFKN 162 

200 LILTARSLDKLYNFADCSGLHLIFALNALRRNPNNSWNSSSALSLLKYSA 249 

. I.I I ! I I . I I II I tl II II i • I I I I . I til. 
163 STYSRSSVDVLYTFANCSGLDLIFGLNALLRTADLQWNSSNAQLLLDYCS 212 

250 SKKYNISWELGNEPNNYRTMHGRAVNGSQLGKDYIQLKSLLQPIRIYSRA 299 

I 1 I I I I II 1 i I I II :|lllll.|: : i 

213 SKGYNISWELGNEPNSFLKKADIFINGSQLGEDFIQLHKLLRK.STFKNA 261 

300 SLYGPNIGRPRKNVIALLDGFMKVAGSTVDAVTWQHCYIDGRVVKVMDFL 349 

I I II . : I . I I : : I I : I I : I . I I 1 I I : • M IN 
262 KLYGPDVGQPRRKTAKMLKSFLKAGGEVIDSVTWHHYYLNGRTATREDFL 311 

350 KTRLLDTLSDQIRKIQKWNTYTPGKKIWLEGWTTSAGGTNNLSDSYAA 399 

.11 : . I : . I I . 1111:11 • II I II II 

312 NPDVLDIFISSVQKVFQWESTRPGKKVWLGETSSAYGGGAPLLSDTFAA 361 

400 GFLWLNTLGMLANQGIDWTRHSFFDHGYNHLVDQNFNPLPDYWLSLLYK 449 

II : I I . I 1 : I II : I i . I II 1 I II I : I 1 . I II i 1 II t I I : I 

362 GEMWLDKLGLSARMGIEWMRQVFFGAGNYHLVDENFDPLPDYWLSLLFK 411 

450 RLIGPKVLAVHVAGLQRKPRPGRVIRDKLRIYMCTNHHNHNYVRGSITL 499 

:i:| Ml I 1 .1: lll:| Mil I I 1 Hi 

412 KLVGTKVLMASVQGSKRR KLRVYLHCTNTDNPRYKEGDLTL 452 

500 FIINLHRSRKKIKLAGTLRDKLVHQYLLQPYGQEGLKSKSVQLNGQPLVM 549 

: II II I :: I . I I . M I - I i I I II 1 I I II I It 

453 YAINLHNVTKYLRLPYPFSNKQVDKYLLRPLGPHGLLSKSVQLNGLTLKM 502 

. • • ■ 

550 VDDGTLPELKPRPLRAGRTLVIPPVTMGFF\A^KNVNALACRYR 592 

I II I I I I : I I I I . I : I • 1 1 I : : I II 
503 VDDQTLPPLMEKPLRPGSSLGLPAFSYSFFVIRNAKV7ACI. 543 
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1 

SEQUENCE LISTING 



(1) 



GENERAL INFORMATION: 



(i) 
(ii) 

(iii) 
(iv) 



(v) 



APPLICANT: 

TITLE OF INVENTION: 

NUMBER OF SEQUENCES: 
CORRESPONDENCE ADDRESS: 



(A) 
(B) 
(C) 
(D) 
(E) 
(F) 



ADDRESSEE: 
STREET: 
CITY: 
STATE : 
COUNTRY : 
ZIP: 



COMPUTER READABLE FORM: 
(A) MEDIUM TYPE: 

<B) COMPUTER: 

(C) OPERATING SYSTEM: 

(D) SOFTWARE: 



iris Pecker et al. 

POLYNUCLEOTIDES AND POLYPEPTIDES 
ENCODED THEREBY 

24 

Sol Sheinbein c/o Anthony Castorina 
2001 Jefferson Davis Highway, Suite 207 
Arlington 
Virginia 

United States of America 
22202 

1.44 megabyte, 3.5" microdisk 

Twinhead* Sliinnote-890TX 

MS DOS version 6.2, 

Windows version 3. XI 

Word for Windows version 2.0 

converted to an ASCI 



file 



(vi) 



(vii) 



(viii) 



(ix) 



CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 
PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 
ATTORNEY /AGENT INFORMATION: 

(A) NAME: 

(B) REGISTRATION NUMBER: 

(C) REFERENCE/DOCKET NUMBER 
TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 

(B) TELEFAX: 

(C) TELEX: 



60/140,801 
June 25, 1999 



Sheinbein, 

25,457 

20105 



Sol 



972-3-6127676 
972-3-6127575 



(2) 



INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2060 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

CGCTTAATTC TAGAAGAGGG ATTGAATGAG GGTGCTTTGT GCCTTCCCTG 50 

AAGCCATGCC CTCCAGCAAC TCCCGCCCCC CCGCGTGCCT AGCCCCGGGG 100 

GCTCTCTACT TGGCTCTGTT GCTCCATCTC TCCCTTTCCT CCCAGGCTGG 150 

AGACAGGAGA CCCTTGCCTG TAGACAGAGC TGCAGGTTTG AAGGAAAAGA 200 

CCCTGATTCt ACTTGATGTG AGCACCAAGA ACCCAGTCAG GACAGTCAAT 250 

GAGAACTTCC TCTCTCTGCA GCTGGATCCG TCCATCATTC ATGATGGCTG 300 

GCTCGATTTC CTAAGCTCCA AGCGCTTGGT GACCCTGGCC CGGGGACTTT 350 

CGCCCGCCTT TCTGCGCTTC GGGGGCAAAA GGACCGACTT CCTGCAGTTC 400 

CAGAACCTGA GGAACCCGGC GAAAAGCCGC GGGGGCCCGG GCCCGGATTA 450 

CTATCTCAAA AACTATGAGG ATGACATTGT TCGAAGTGAT GTTGCCTTAG 500 

ATAAACAGAA AGGCTGCAAG ATTGCCCAGC ACCCTGATGT TATGCTGGAG 550 

CTCCAAAGGG AGAAGGCAGC TCAGATGCAT CTGGTTCTTC TAAAGGAGCA 600 

ATTCTCCAAT ACTTACAGTA ATCTCATATT AACAGCCAGG TCTCTAGACA 650 

AACTTTATAA CTTTGCTGAT TGCTCTGGAC TCCACCTGAT ATTTGCTCTA 700 

AATGCACTGC GTCGTAATCC CAATAACTCC TGGAACAGTT CTAGTGCCCT 750 

GAGTCTGTTG AAGTACAGCG CCAGCAAAAA GTACAACATT TCTTGGGAAC 800 

TGGGTAATGA GCCAAATAAC TATCGGACCA TGCATGGCCG GGCAGTAAAT 850 

GGCAGCCAGT TGGGAAAGGA TTACATCCAG CTGAAGAGCC TGTTGCAGCC 900 

CATCCGGATT TATTCCAGAG CCAGCTTATA TGGCCCTAAT ATTGGGCGGC 950 

CGAGGAAGAA TGTCATCGCC CTCCTAGATG GATTCATGAA GGTGGCAGGA 1000 

AGTACAGTAG ATGCAGTTAC CTGGCAACAT TGCTACATTG ATGGCCGGGT 1050 

GGTCAAGGTG ATGGACTTCC TGAAAACTCG CCTGTTAGAC ACACTCTCTG 1100 

ACCAGATTAG GAAAATTCAG AAAGTGGTTA ATACATACAC TCCAGGAAAG 1150 

AAGATTTGGC TTGAAGGTGT GGTGACCACC TCAGCTGGAG GCACAAACAA 1200 

TCTATCCGAT TCCTATGCTG CAGGATTCTT ATGGTTGAAC ACTTTAGGAA 1250 

TGCTGGCCAA TCAGGGCATT GATGTCGTGA TACGGCACTC ATTTTTTGAC 1300 

CATGGATACA ATCACCTCGT GGACCAGAAT TTTAACCCAT TACCAGACTA 1350 

CTGGCTCTCT CTCCTCTACA AGCGCCTGAT CGGCCCCA7U\ GTCTTGGCTG 14 00 

TGCATGTGGC TGGGCTCCAG CGGAAGCCAC GGCCTGGCCG AGTGATCCGG 1450 

GACAAACTAA GGATTTATGC TCACTGCACA AACCACCACA ACCACAACTA 1500 

CGTTCGTGGG TCCATTACAC TTTTTATCAT CAACTTGCAT CGATCAAGAA 1550 

AGAAAATCAA GCTGGCTGGG ACTCTCAGAG ACAAGCTGGT TCACCAGTAC 1600 

CTGCTGCAGC CCTATGGGCA GGAGGGCCTA AAGTCCAAGT CAGTGCAACT 1650 

GAATGGCCAG CCCTTAGTGA TGGTGGACGA CGGGACCCTC CCAGAATTGA 1700 

AGCCCCGCCC CCTTCGGGCC GGCCGGACAT TGGTCATCCC TCCAGTCACC 1750 

ATGGGCTTTT TTGTGGTCAA GAATGTCAAT GCTTTGGCCT GCCGCTACCG 1800 
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ATAAGCTATC 
CACTCCTCCA 
GCCCCTGCTG 
AGACTAAATG 
AAAGGAAAAT 
TATAGAGCAA 



CTCACACTCA TGGCTACCAG TGGGCCTGCT GGGCTGCTTC 1850 

CTCCAGTAGT ATCCTCTGTT TTCAGACATC CTAGCAACCA 1900 

CCCCATCCTG CTGGAATCAA CACAGACTTG CTCTCCAAAG 1950 

TCATAGCGTG ATCTTAGCCT AGGTAGGCCA CATCCATCCC 2000 

GTAGACATCA CCTGTACCTA TATAAGGATA AAGGCATGTG 2050 

2060 



(2) 



INFORMATION FOR SEQ ID NO: 2: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE : 

STRANDEDNESS : 
TOPOLOGY : 



2060 

nucleic acid 
double 
linear 
SEQ ID NO: 2: 



SEQUENCE DESCRIPTION: 

C GCT TAA TTC TAG 7VAG AGG GAT TGA 25 
ATG AGG GTG CTT TGT GCC TTC CCT GAA GCC ATG CCC TCC AGC AAC 70 
Met Arg Val Leu Cys Ala Phe Pro Glu Ala Met Pro Ser Ser Asn 
5 10 15 

TCC CGC CCC CCC GCG TGC CTA GCC CCG GGG GCT CTC TAC TTG GCT 115 
Ser Arg Pro Pro Ala Cys Leu Ala Pro Gly Ala Leu Tyr Leu Ala 
20 25 30 

CTG TTG CTC CAT CTC TCC CTT TCC TCC CAG GCT GGA GAC AGG AGA 160 
Leu Leu Leu His Leu Ser Leu Ser Ser Gin Ala Gly Asp Arg Arg 
35 40 45 

CCC TTG CCT GTA GAC AGA GCT GCA GGT TTG AAG GAA AAG ACC CTG 205 
Pro Leu Pro Val Asp Arg Ala Ala Gly Leu Lys Glu Lys Thr Leu 
50 55 60 

ATT CTA CTT GAT GTG AGC ACC AAG AAC CCA GTC AGG ACA GTC AAT 250 
lie Leu Leu Asp Val Ser Thr Lys Asn Pro Val Arg Thr Val Asn 
65 70 75 

GAG AAC TTC CTC TCT CTG CAG CTG GAT CCG TCC ATC ATT CAT GAT 295 
Glu Asn Phe Leu Ser Leu Gin Leu Asp Pro Ser lie lie His Asp 
80 85 90 

GGC TGG CTC GAT TTC CTA AGC TCC AAG CGC TTG GTG ACC CTG GCC 340 
Gly Trp Leu Asp Phe Leu Ser Ser Lys Arg Leu Val Thr Leu Ala 
95 100 105 

CGG GGA CTT TCG CCC GCC TTT CTG CGC TTC GGG GGC AAA AGG ACC 385 
Arg Glv Leu Ser Pro Ala Phe Leu Arg Phe Gly Gly Lys Arg Thr 
110 115 120 

GAC TTC CTG CAG TTC CAG AAC CTG AGG AAC CCG GCG AAA AGC CGC 430 
Asp Phe Leu Gin Phe Gin Asn Leu Arg Asn Pro Ala Lys Ser Arg 
125 130 135 

GGG GGC CCG GGC CCG GAT TAC TAT CTC AAA AAC TAT GAG GAT GAC 475 
Gly Gly Pro Gly Pro Asp Tyr Tyr Leu Lys Asn Tyr Glu Asp Asp 
140 145 150 

ATT GTT CGA AGT GAT GTT GCC TTA GAT AAA CAG AAA GGC TGC AAG 520 
lie Val Arg Ser Asp Val Ala Leu Asp Lys Gin Lys Gly Cys Lys 
155 160 165 

ATT GCC CAG CAC CCT GAT GTT ATG CTG GAG CTC CAA AGG GAG AAG 565 
lie Ala Gin His Pro Asp Val Met Leu Glu Leu Gin Arg Glu Lys 
170 175 180 

GCA GCT CAG ATG CAT CTG GTT CTT CTA AAG GAG CAA TTC TCC AAT 610 
Ala Ala Gin Met His Leu Val Leu Leu Lys Glu Gin Phe Ser Asn 
185 190 195 

ACT TAC AGT AAT CTC ATA TTA ACA GCC AGG TCT CTA GAC AAA CTT 655 
Thr Tyr Ser Asn Leu lie Leu Thr Ala Arg Ser Leu Asp Lys Leu 
200 205 210 

TAT AAC TTT GCT GAT TGC TCT GGA CTC CAC CTG ATA TTT GCT CTA 700 
Tyr Asn Phe Ala Asp Cys Ser Gly Leu His Leu lie Phe Ala Leu 
215 220 225 

AAT GCA CTG CGT CGT AAT CCC AAT AAC TCC TGG AAC AGT TCT AGT 745 
Asn Ala Leu Arg Arg Asn Pro Asn Asn Ser Trp Asn Ser Ser Ser 
230 235 240 

GCC CTG AGT CTG TTG AAG TAC AGC GCC AGC AAA AAG TAC AAC ATT 790 
Ala Leu Ser Leu Leu Lys Tyr Ser Ala Ser Lys Lys Tyr Asn lie 
245 250 255 

TCT TGG GAA CTG GGT AAT GAG CCA AAT AAC TAT CGG ACC ATG CAT 835 
Ser Trp Glu Leu Gly Asn Glu Pro Asn Asn Tyr Arg Thr Met His 
260 265 270 

GGC CGG GCA GTA AAT GGC AGC CAG TTG GGA AAG GAT TAC ATC CAG 880 
Gly Arg Ala Val Asn Gly Ser Gin Leu Gly Lys Asp Tyr lie Gin 
275 280 285 

CTG AAG AGC CTG TTG CAG CCC ATC CGG ATT TAT TCC AGA GCC AGC 925 
Leu Lys Ser Leu Leu Gin Pro lie Arg lie Tyr Ser Arg Ala Ser 
290 295 300 

TTA TAT GGC CCT AAT ATT GGG CGG CCG AGG AAG AAT GTC ATC GCC 970 
Leu Tyr Gly Pro Asn lie Gly Arg Pro Arg Lys Asn Val lie Ala 
305 310 315 

CTC CTA GAT GGA TTC ATG AAG GTG GCA GGA AGT ACA GTA GAT GCA 1015 
Leu Leu Asp Gly Phe Met Lys Val Ala Gly Ser Thr Val Asp Ala 
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320 325 330 



GTT 


ACC 


TGG 


CAA 


CAT 


TGC 


TAC 


ATT 


GAT 


GGC 


CGG 


GTG 


GTC 


AAG 


GTG 


1060 


Val 


Thr 


Trp 


Gin 


His 


Cys 


Tyr 


He 


Asp 


Gly 


Arg 


Val 


Val 


Lys 


Val 












335 










340 










345 




ATG 


GAC 


TTC 


CTG 


AAA 


ACT 


CGC 


CTG 


TTA 


GAC 


ACA 


CTC 


TCT 


GAC 


CAG 


1105 


Met 


Asp 


Phe 


Leu 


Lys 


Thr Arg 


Leu 


Leu 


Asp 


Thr 


Leu 


Ser 


Asp 


Gin 












350 










355 










360 




ATT 


AGG 


AAA 


ATT 


CAG 


AAA 


GTG 


GTT 


AAT 


ACA 


TAC 


ACT 


CCA 


GGA 


AAG 


1150 


He 


Arg 


Lys 


He 


Gin 


Lys 


Val 


Val 


Asn 


Thr 


Tyr 


Thr 


Pro 


Gly 


Lys 












365 










370 










375 




AAG 


ATT 


TGG 


CTT 


GAA 


GGT 


GTG 


GTG 


ACC 


ACC 


TCA 


GCT 


GGA 


GGC 


ACA 


1195 


Lys 


He 


Trp 


Leu 


Glu 


Gly 


Val 


Val 


Thr 


Thr 


Ser 


Ala 


Gly 


Gly 


Thr 












380 










385 










390 




AAC 


AAT 


CTA 


TCC 


GAT 


TCC 


TAT 


GCT 


GCA 


GGA 


TTC 


TTA 


TGG 


TTG 


AAC 


1240 


Asn 


Asn 


Leu 


Ser 


Asp 


Ser 


Tyr 


Ala 


Ala 


Gly 


Phe 


Leu 


Trp 


Leu 


Asn 












395 










400 










405 




ACT 


TTA 


GGA 


ATG 


CTG 


GCC 


AAT 


CAG 


GGC 


ATT 


GAT 


GTC 


GTG 


ATA 


CGG 


1285 


Thr 


Leu 


Gly 


Met 


Leu 


Ala 


Asn 


Gin 


Gly 


He 


Asp 


Val 


Val 


He 


Arg 












410 










415 










420 




CAC 


TCA 


TTT 


TTT 


GAC 


CAT 


GGA 


TAC 


AAT 


CAC 


CTC 


GTG 


GAC 


CAG 


AAT 


1330 


His 


Ser 


Phe 


Phe 


Asp 


His 


Gly 


Tyr 


Asn 


His 


Leu 


Val 


Asp 


Gin 


Asn 












425 










430 










435 




TTT 


AAC 


CCA 


TTA 


CCA 


GAC 


TAC 


TGG 


CTC 


TCT 


CTC 


CTC 


TAC 


AAG 


CGC 


1375 


Phe 


Asn 


Pro 


Leu 


Pro 


Asp 


Tyr 


Trp 


Leu 


Ser 


Leu 


Leu 


Tyr 


Lys 


Arg 












440 










445 










450 




CTG 


ATC 


GGC 


CCC 


AAA 


GTC 


TTG 


GCT 


GTG 


CAT 


GTG 


GCT 


GGG 


CTC 


CAG 


1420 


Leu 


He 


Gly 


Pro 


Lys 


Val 


Leu 


Ala 


Val 


His 


Val 


Ala 


Gly 


Leu 


Gin 












455 










460 










4 65 




CGG 


AAG 


CCA 


CGG 


CCT 


GGC 


CGA 


GTG 


ATC 


CGG 


GAC 


AAA 


CTA 


AGG 


ATT 


1465 


Arg 


Lys 


Pro 


Arg 


Pro Gly 


Arg 


Val 


He 


Arg 


Asp 


Lys 


Leu 


Arg 


He 












470 










475 










480 




TAT 


GCT 


CAC 


TGC 


ACA 


AAC 


CAC 


CAC 


AAC 


CAC 


AAC 


TAC 


GTT 


CGT 


GGG 


1510 


Tyr 


Ala 


His 


Cys 


Thr 


Asn 


His 


His 


Asn 


His 


Asn 


Tyr 


val 


Arg 


Gly 












485 










490 










495 




TCC 


ATT 


ACA 


CTT 


TTT 


ATC 


ATC 


AAC 


TTG 


CAT 


CGA 


TCA 


AGA 


AAG 


AAA 


1555 


Ser 


He 


Thr 


Leu 


Phe 


He 


He 


Asn 


Leu 


His 


Arg 


Ser 


Arg 


Lys 


Lys 












500 










505 










510 




ATC 


AAG 


CTG 


GCT 


GGG 


ACT 


CTC 


AGA 


GAC 


AAG 


CTG 


GTT 


CAC 


CAG 


TAC 


1600 


He 


Lys 


Leu 


Ala 


Gly 


Thr 


Leu 


Arg 


Asp 


Lys 


Leu 


Val 


His 


Gin 


Tyr 












515 










520 










525 




CTG 


CTG 


CAG 


CCC 


TAT 


GGG 


CAG 


GAG 


GGC 


CTA 


AAG 


TCC 


AAG 


TCA 


GTG 


1645 


Leu 


Leu 


Gin 


Pro 


Tyr 


Gly 


Gin 


Glu 


Gly 


Leu 


Lys 


Ser 


Lys 


Ser 


val 












530 










535 










540 




CAA 


CTG 


AAT 


GGC 


CAG 


CCC 


TTA 


GTG 


ATG 


GTG 


GAC 


GAC 


GGG 


ACC 


CTC 


1690 


Gin 


Leu 


Asn 


Gly 


Gin 


Pro 


Leu 


Val 


Met 


Val 


Asp 


Asp 


Gly 


Thr 


Leu 












545 










550 










555 




CCA 


GAA 


TTG 


AAG 


CCC 


CGC 


CCC 


CTT 


CGG 


GCC 


GGC 


CGG 


ACA 


TTG 


GTC 


1735 


Pro 


Glu 


Leu 


Lys 


Pro 


Arg 


pro 


Leu 


Arg 


Ala 


Gly Arg 


Thr 


Leu 


val 










560 










565 










570 




ATC 


CCT 


CCA 


GTC 


ACC 


ATG 


GGC 


TTT 


TTT 


GTG 


GTC 


AAG 


AAT 


GTC 


AAT 


1780 


He 


Pro 


Pro 


Val 


Thr 


Met 


Gly 


Phe 


Phe 


Val 


Val 


Lys 


Asn 


Val 


Asn 












575 








580 










585 




GCT 


TTG 


GCC 


TGC 


CGC 


TAC 


CGA 


TAA 


GCT 


ATC 


CTC 


ACA 


CTC 


ATG 


GCT 


1825 


Ala 


Leu 


Ala 


Cys 


Arg 


Tyr 


Arg 




























590 
























ACC 


AGT 


GGG 


CCT 


GCT 


GGG 


CTG 


CTT 


CCA 


CTC 


CTC 


CAC 


TCC 


AGT 


AGT 


1870 


ATC 


CTC 


TGT 


TTT 


CAG 


ACA 


TCC 


TAG 


CAA 


CCA 


GCC 


CCT 


GCT 


GCC 


CCA 


1915 


TCC 


TGC 


TGG 


AAT 


CAA 


CAC 


AGA 


CTT 


GCT 


CTC 


CAA 


AGA 


GAC 


TAA 


ATG 


1960 


TCA 


TAG 


CGT 


GAT 


CTT 


AGC 


CTA 


GGT 


AGG 


CCA 


CAT 


CCA 


TCC 


CAA 


AGG 


2005 


AAA 


ATG 


TAG 


ACA 


TCA 


CCT 


GTA 


CCT 


ATA 


TAA 


GGA 


TAA 


AGG 


CAT 


GTG 


2050 


TAT 


AGA 


GCA 


A 
























2060 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 









(A) 


LENGTH : 


592 














tB) 


TYPE: 


amino < 


acid 












(C) 


STRANDEDNESS: 


single 














(D) 


TOPOLOGY : 


linear 












(xi) 


SEQUENCE 


DESCRIPTION: 


: SEQ ID 


NO: 3: 






Met 


Arg 


Val 


Leu Cys 


Ala 


Phe Pro Glu 


Ala Met 


Pro Ser 


Ser 


Asn 






5 






10 






15 


Ser 


Arg 


Pro 


Pro Ala 


Cys 


Leu Ala Pro Gly Ala 


Leu Tyr 


Leu 


Ala 








20 




25 






30 


Leu 


Leu 


Leu 


His Leu 


Ser 


Leu Ser Ser 


Gin Ala 


Gly Asp Arg Arg 








35 






40 






45 


Pro 


Leu 


Pro 


Val Asp 


Arg 


Ala Ala Gly 


Leu Lys 


Glu Lys 


Thr 


Leu 








50 






55 






60 


He 


Leu 


Leu 


Asp Val 


Ser 


Thr Lys Asn 


Pro Val 


Arg Thr 


Val 


Asn 








65 






70 






75 


Glu 


Asn 


Phe 


Leu Ser 


Leu 


Gin Leu Asp 


Pro Ser 


He He 


His 


Asp 
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4 







BO 










85 










90 


Gly Trp 


Leu 


Asp Phe 


Leu 


Ser 


ser 


Lys 


Arg 


Leu 


Val 


Thr 


Leu 


Ala 




95 










100 










105 


Arg Gly 


Leu 


Ser Pro 


Ala 


Phe 


Leu 


Arg 


Phe 


Gly 


Gly 


Lys 


Arg 


Thr 




110 










115 










120 


Asp Phe 


Leu 


Gin Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser 


Arg 




125 










130 










135 


Gly Gly 


Pro 


Gly Pro 
140 


Asp 


Tyr 


Tyr 


Leu 


Lys 
145 


Asn 


Tyr 


Glu 


Asp 


Asp 
150 


He val 


Arg 


Ser Asp 


val 


Ala 


Leu 


Asp 


Lys 


Gin 


Lys 


Gly Cys 


Lys 




155 










160 










165 


He Ala 


Gin 


His Pro 


Asp 


Val 


Met 


Leu 


Glu 


Leu 


Gin 


Arg 


Glu 


Lys 






170 








175 










180 


Ala Ala 


Gin 


Met His 
185 


Leu 


Val 


Leu 


Leu 


Lys 
190 


Glu 


Gin 


Phe 


Ser 


Asn 
195 


Thr Tyr 


Ser 


Asn Leu 


He 


Leu 


Thr 


Ala 


Arg 


ser 


Leu 


Asp 


Lys 


Leu 




200 










205 










210 


Tyr Asn 


Phe 


Ala Asp 


Cys 


Ser 


Gly 


Leu 


His 


Leu 


He 


Phe 


Ala 


Leu 




215 










220 










225 


Asn Ala 


Leu 


Arg Arg 
230 


Asn 


Pro 


Asn 


Asn 


Ser 
235 


Trp 


Asn 


Ser 


Ser 


Ser 
240 


Ala Leu 


Ser 


Leu Leu 


Lys 


Tyr 


Ser 


Ala 


Ser 


Lys 


Lys 


Tyr 


Asn 


He 






245 






250 










255 


Ser Trp 


Glu 


Leu Gly 


Asn 


Glu 


Pro 


Asn 


Asn 


Tyr 


Arg 


Thr 


Met 


His 




260 










265 










270 


Gly Arg 


Ala 


Val Asn 


Gly 


Ser 


Gin 


Leu 


Gly 


Lys 


Asp 


Tyr 


He 


Gin 




275 










280 










285 


Leu Lys 


Ser 


Leu Leu 


Gin 


Pro 


He 


Arg 


He 


Tyr 


Ser 


Arg 


Ala 


Ser 




290 










295 










300 


Leu Tyr 


Gly 


Pro Asn 


He 


Gly 


Arg 


Pro 


Arg 


Lys 


Asn 


Val 


He 


Ala 


305 










310 










315 


Leu Leu 


Asp 


Gly Phe 


Met 


Lys 


Val 


Ala 


Gly 


Ser 


Thr 


Val 


Asp 


Ala 




320 










325 










330 


Val Thr 


Trp 


Gin His Cys 


Tyr 


He Asp Gly Arg 


Val 


val 


Lys 


Val 




335 








340 










345 


Met Asp 


Phe 


Leu Lys 


Thr 


Arg 


Leu 


Leu 


Asp 


Thr 


Leu 


Ser 


Asp 


Gin 




350 










355 










360 


He Arg 


Lys 


He Gin 


Lys 


Val 


Val 


Asn 


Thr 


Tyr 


Thr 


Pro Gly 


Lys 


365 








370 










375 


Lys He 


Trp 


Leu Glu 


Gly 


Val 


Val 


Thr 


Thr 


Ser 


Ala 


Gly 


Gly 


Thr 


380 










385 










390 


Asn Asn 


Leu 


Ser Asp 


Ser 


Tyr 


Ala 


Ala 


Gly 


Phe 


Leu 


Trp 


Leu 


Asn 






395 








400 










405 


Thr Leu 


Gly Met Leu Ala Asn Gin Gly 


He 


Asp 


Val 


Val 


He 


Arg 






410 










415 










420 


His Ser 


Phe 


Phe Asp 


His 


Gly 


Tyr 


Asn 


His 


Leu 


Val 


Asp 


Gin 


Asn 






425 






430 










435 


Phe Asn 


Pro 


Leu Pro 


Asp 


Tyr 


Trp 


Leu 


Ser 


Leu 


Leu 


Tyr 


Lys 


Arg 






440 




445 










450 


Leu He 


Gly 


Pro Lys 


Val 


Leu 


Ala 


Val 


His 


Val 


Ala 


Gly 


Leu 


Gin 




455 










4 60 










465 


Arg Lys 


Pro 


Arg Pro 


Gly 


Arg 


Val 


He 


Arg 


Asp 


Lys 


Leu 


Arg 


He 




470 










476 










480 


Tyr Ala 


His 


Cys Thr 


Asn 


His 


His 


Asn 


His 


Asn 


Tyr 


Val 


Arg 


Gly 




485 










490 










495 


Ser He 


Thr 


Leu Phe 
500 


He 


He 


Asn 


Leu 


His 
505 


Arg 


Ser 


Arg 


Lys 


Lys 
510 


He Lys 


Leu 


Ala Gly 


Thr 


Leu 


Arg 


Asp 


Lys 


Leu 


Val 


His 


Gin 


Tyr 




515 










520 










525 


Leu Leu 


Gin 


Pro Tyr Gly Gin Glu Gly 


Leu 


Lys 


Ser 


Lys 


Ser 


Val 






530 










535 










540 


Gin Leu 


Asn 


Gly Gin 


Pro 


Leu 


Val 


Met 


Val 


Asp Asp Gly Thr 


Leu 






545 










550 










555 


Pro Glu 


Leu 


1 Lys Pro 
560 


Arg 


Pro 


Leu 


Arg 


Ala 

565 


Gly 


Arg 


Thr 


Leu 


Val 
570 


He Pro 


» Pre 


1 Val Thr 


Met 


Gly 


Phe 


Phe 


val 


val 


Lys 


Asn 


val 


Asn 






576 








580 










585 



Ala Leu Ala Cys Arg Tyr Arg 
590 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 1898 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CGCTTAATTC TAGAAGAGGG ATTGAATGAG GGTGCTTTGT GCCTTCCCTG 50 

AAGCCATGCC CTCCAGCAAC TCCCGCCCCC CCGCGTGCCT AGCCCCGGGG 100 

GCTCTCTACT TGGCTCTGTT GCTCCATCTC TCCCTTTCCT CCCAGGCTGG 150 
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5 

AGACAGGAGA CCCTTGCCTG TAGACAGAGC TGCAGGTTTG AAGGAAAAGA 200 

CCCTGATTCt ACTTGATGTG AGCACCAAGA ACCCAGTCAG GACAGTCAAT 250 

GAGAACTTCC TCTCTCTGCA GCTGGATCCG TCCATCATTC ATGATGGCTG 300 

GCTCGATTTC CTAAGCTCCA AGCGCTTGGT GACCCTGGCC CGGGGACTTT 350 

CGCCCGCCTT TCTGCGCTTC GGGGGCAAAA GGACCGACTT CCTGCAGTTC 4 00 

CAGAACCTGA GGAACCCGGC GAAAAGCCGC GGGGGCCCGG GCCCGGATTA 450 

CTATCTCAAA AACTATGAGG ATGCCAGGTC TCTAGACAAA CTTTATAACT 500 

TTGCTGATTG CTCTGGACTC CACCTGATAT TTGCTCTAAA TGCACTGCGT 550 

CGTAATCCCA ATAACTCCTG GAACAGTTCT AGTGCCCTGA GTCTGTTGAA 600 

GTACAGCGCC AGCAAAAAGT ACAACATTTC TTGGGAACTG GGTAATGAGC 650 

CAAATAACTA TCGGACCATG CATGGCCGGG CAGTAAATGG CAGCCAGTTG 700 

GGAAAGGATT ACATCCAGCT GAAGAGCCTG TTGCAGCCCA TCCGGATTTA 750 

TTCCAGAGCC AGCTTATATG GCCCTAATAT TGGGCGGCCG AGGAAGAATG 800 

TCATCGCCCT CCTAGATGGA TTCATGAAGG TGGCAGGAAG TACAGTAGAT 850 

GCAGTTACCT GGCAACATTG CTACATTGAT GGCCGGGTGG TCAAGGTGAT 900 

GGACTTCCTG AAAACTCGCC TGTTAGACAC ACTCTCTGAC CAGATTAGGA 950 

AAATTCAGAA AGTGGTTAAT ACATACACTC CAGGAAAGAA GATTTGGCTT 1000 

GAAGGTGTGG TGACCACCTC AGCTGGAGGC ACAAACAATC TATCCGATTC 1050 

CTATGCTGCA GGATTCTTAT GGTTGAACAC TTTAGGAATG CTGGCCAATC 1100 

AGGGCATTGA TGTCGTGATA CGGCACTCAT TTTTTGACCA TGGATACAAT 1150 

CACCTCGTGG ACCAGAATTT TAACCCATTA CCAGACTACT GGCTCTCTCT 1200 

CCTCTACAAG CGCCTGATCG GCCCCAAAGT CTTGGCTGTG CATGTGGCTG 1250 

GGCTCCAGCG GAAGCCACGG CCTGGCCGAG TGATCCGGGA CAAAcTAAGG 1300 

ATTTATGCTC ACTGCACAAA CCACCACAAC CACAACTACG TTCGTGGGTC 1350 

CATTACACTT TTTATCATCA ACTTGCATCG ATCAAGAAAG AAAATCAAGC 1400 

TGGCTGGGAC TCTCAGAGAC AAGCTGGTTC ACCAGTACCT GCTGCAGCCC 14 50 

TATGGGCAGG AGGGCCTAAA GTCCAAGTCA GTGCAACTGA ATGGCCAGCC 1500 

CTTAGTGATG GTGGACGACG GGACCCTCCC AGAATTGAAG CCCCGCCCCC 1550 

TTCGGGCCGG CCGGACATTG GTCATCCCTC CAGTCACCAT GGGCTTTTTT 1600 

GTGGTCAAGA ATGTCAATGC TTTGGCCTGC CGCTACCGAT AAGCTATCCT 1650 

CACACTCATG GCTACCAGTG GGCCTGCTGG GCTGCTTCCA CTCCTCCACT 1700 

CCAGTAGTAT CCTCTGTTTT CAGACATCCT AGCAACCAGC CCCTGCTGCC 1750 

CCATCCTGCT GGAATCAACA CAGACTTGCT CTCCAAAGAG ACTAAATGTC 1800 

ATAGCGTGAT CTTAGCCTAG GTAGGCCACA TCCATCCCAA AGGAAAATGT 1850 

AGACATCACC TGTACCTATA TAAGGATAAA GGCATGTGTA TAGAGCAA 1898 



2) INFORMATION FOR SEQ ID NO: 5: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 538 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met 


Arg Val 


Leu 


Cys 


Ala 


Phe 


Pro 


Glu 


Ala 


Met 


Pro 


Ser 


Ser 


Asn 








5 










10 










15 


Ser 


Arg Pro 


Pro 


Ala 


Cys 


Leu 


Ala 


Pro 


Gly 


Ala 


Leu 


Tyr 


Leu 


Ala 






20 








25 










30 


Leu 


Leu Leu 


His 


Leu 


Ser 


Leu 


Ser 


Ser 


Gin 


Ala 


Gly 


Asp 


Arg 


Arg 








35 










40 










45 


Pro 


Leu Pro 


Val 


Asp 


Arg 


Ala 


Ala 


Gly 


Leu 


Lys 


Glu 


Lys 


Thr 


Leu 








50 










55 










60 


He 


Leu Leu 


Asp 


Val 


Ser 


Thr 


Lys 


Asn 


Pro 


Val 


Arg 


Thr 


Val 


Asn 








65 










70 










75 


Glu 


Asn Phe 


Leu 


Ser 


Leu 


Gin 


Leu 


Asp 


Pro 


Ser 


He 


He 


His 


Asp 








80 










85 










90 


Gly 


Trp Leu 


Asp 


Phe 


Leu 


Ser 


Ser 


Lys 


Arg 


Leu 


Val 


Thr 


Leu 


Ala 








95 










100 










105 


Arg 


Gly Leu 


Ser 


Pro 


Ala 


Phe 


Leu 


Arg 


Phe 


Gly 


Gly 


Lys 


Arg 


Thr 








110 










115 










120 


Asp 


Phe Leu 


Gin 


Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser 


Arg 








125 










130 










135 


Gly 


Gly Pro 


Gly 


Pro 


Asp 


Tyr 


Tyr 


Leu 


Lys 


Asn 


Tyr 


Glu 


Asp 


Ala 








140 










145 










150 


Arg 


Ser Leu 


Asp 


Lys 


Leu 


Tyr 


Asn 


Phe 


Ala 


Asp 


Cys 


Ser 


Gly 


Leu 








155 










160 










165 


His 


Leu He 


Phe 


Ala 


Leu 


Asn 


Ala 


Leu 


Arg 


Arg 


Asn 


Pro 


Asn 


Asn 








170 










175 










180 


Ser 


Trp Asn 


Ser 


Ser 


Ser 


Ala 


Leu 


Ser 


Leu 


Leu 


Lys 


Tyr 


Ser 


Ala 






185 










190 










195 


Ser 


Lys Lys 


Tyr 


Asn 


He 


Ser 


Trp 


Glu 


Leu 


Gly 


Asn 


Glu 


Pro 


Asn 








200 










205 










210 


Asn 


Tyr Arg 


Thr 


Met 


His 


Gly 


Arg 


Ala 


Val 


Asn 


Gly 


Ser 


Gin 


Leu 






215 










220 










225 


Gly 


Lys Asp 


Tyr 


He 


Gin 


Leu 


Lys 


Ser 


Leu 


Leu 


Gin 


Pro 


He 


Arg 








230 










235 










240 


He 


Tyr Ser 


Arg 


Ala 


Ser 


Leu 


Tyr 


Gly 


Pro 


Asn 


He 


Gly 


Arg 


Pro 








245 










250 










255 


Arg 


Lys Asn 


val 


He 


Ala 


Leu 


Leu 


Asp 


Gly 


Phe 


Met 


Lys 


Val 


Ala 
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260 










265 










270 


Gly 


Ser 


Thr 


Val 


Asp 
275 


Ala 


Val 


Thr 


Trp 


Gin 
280 


His 


Cys 


Tyr 


He 


Asp 
285 


Gly 


Arg 


Val 


val 


Lys 
290 


Val 


Met 


Asp 


Phe 


Leu 
295 


Lys 


Thr 


Arg 


Leu 


Leu 
300 


Asp 


Thr 


Leu 


Ser 


Ala 
305 


Gin 


He 


Arg 


Lys 


He 
310 


Gin 


Lys 


Val 


Val 


Asn 
315 


Thr 


Tyr 


Thr 


Pro 


Gly 
320 


Lys 


Lys 


He 


Trp 


Leu 
325 


Glu 


Gly 


Val 


Val 


Thr 
330 


Thr 


Ser 


Ala 


Gly 


Gly 
335 


Thr 


Asn 


Asn 


Leu 


Ser 
340 


Asp 


Ser 


Tyr 


Ala 


Ala 
345 


Gly 


Phe 


Leu 


Trp 


Leu 
350 


Asn 


Thr 


Leu 


Gly 


Met 
355 


Leu 


Ala 


Asn 


Gin 


Gly 

360 


lie 


Asp 


Val 


Val 


He 
365 


Arg 


His 


Ser 


Phe 


Phe 
370 


Asp 


His 


Gly 


Tyr 


Asn 
375 


His 


Leu 


Val 


Asp 


Gin 
380 


Asn 


Phe 


Asn 


Pro 


Leu 
385 


Pro 


Asp 


Tyr 


Trp 


Leu 
390 


Ser 


Leu 


Leu 


Tyr 


Lys 
395 


Arg 


Leu 


He 


Gly 


Pro 
400 


Lys 


val 


Leu 


Ala 


Val 
405 


His 


Val 


Ala 


Gly 


Leu 
410 


Gin 


Arg 


Lys 


Pro 


Arg 
415 


Pro 


Gly 


Arg 


Val 


He 
420 


Arg 


Asp 


Lys 


Leu 


Arg 
425 


He 


Tyr 


Ala 


His 


Cys 
430 


Thr 


Asn 


His 


His 


Asn 
435 


His 


Asn 


Tyr 


Val 


Arg 
440 


Gly 


Ser 


He 


Thr 


Leu 
445 


Phe 


He 


He 


Asn 


Leu 
450 


His 


Arg 


Ser 


Arg 


Lys 
455 


Lys 


He 


Lys 


Leu 


Ala 
460 


Gly 


Thr 


Leu 


Arg 


Asp 
465 


Lys 


Leu 


Val 


His 


Gin 
470 


Tyr 


Leu 


Leu 


Gin 


Pro 
475 


Tyr 


Gly 


Gin 


Glu 


Gly 
480 


Leu 


Lys 


Ser 


Lys 


Ser 


Val 


Gin 


Leu 


Asn 


Gly 


Gin 


Pro 


Leu 


Val 


Met 






485 










490 










495 


Val 


Asp 


Asp 


Gly 


Thr 
500 


Leu 


Pro 


Glu 


Leu 


Lys 
505 


Pro 


Arg 


Pro 


Leu 


Arg 
510 


Ala 


Gly Arg 


Thr 


Leu 


val 


He 


Pro 


Pro 


Val 


Thr 


Met 


Gly 


Phe 


Phe 










515 










520 










525 


Val 


Val 


Lys 


Asn 


Val 
530 


Asn 


Ala 


Leu 


Ala 


Cys 
535 


Arg 


Tyr 


Arg 







2) 



INFORMATION FOR SEQ ID NO: 6: 
(i) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1724 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CGCTTAATTC TAGAAGAGGG ATTGAATGAG GGTGCTTTGT GCCTTCCCTG 50 

AAGCCATGCC CTCCAGCAAC TCCCGCCCCC CCGCGTGCCT AGCCCCGGGG 100 

GCTCTCTACT TGGCTCTGTT GCTCCATCTC TCCCTTTCCT CCCAGGCTGG 150 

AGACAGGAGA CCCTTGCCTG TAGACAGAGC TGCAGGTTTG AAGGAAAAGA 200 

CCCTGATTCT ACTTGATGTG AGCACCAAGA ACCCAGTCAG GACAGTCAAT 250 

GAGAACTTCC TCTCTCTGCA GCTGGATCCG TCCATCATTC ATGATGGCTG 300 

GCTCGATTTC CTAAGCTCCA AGCGCTTGGT GACCCTGGCC CGGGGACTTT 350 

CGCCCGCCTT TCTGCGCTTC GGGGGCAAAA GGACCGACTT CCTGCAGTTC 4 00 

CAGAACCTGA GGAACCCGGC GAAAAGCCGC GGGGGCCCGG GCCCGGATTA 450 

CTATCTCAAA AACTATGAGG ATGAGCCAAA TAACTATCGG ACCATGCATG 500 

GCCGGGCAGT AAATGGCAGC CAGTTGGGAA AGGATTACAT CCAGCTGAAG 550 

AGCCTGTTGC AGCCCATCCG GATTTATTCC AGAGCCAGCT TATATGGCCC 600 

TAATATTGGG CGGCCGAGGA AGAATGTCAT CGCCCTCCTA GATGGATTCA 650 

TGAAGGTGGC AGGAAGTACA GTAGATGCAG TTACCTGGCA ACATTGCTAC 700 

ATTGATGGCC GGGTGGTCAA GGTGATGGAC TTCCTGAAAA CTCGCCTGTT 7 50 

AGACACACTC TCTGACCAGA TTAGGAAAAT TCAGAAAGTG GTTAATACAT 800 

ACACTCCAGG AAAGAAGATT TGGCTTGAAG GTGTGGTGAC CACCTCAGCT 850 

GGAGGCACAA ACAATCTATC CGATTCCTAT GCTGCAGGAT TCTTATGGTT 900 

GAACACTTTA GGAATGCTGG CCAATCAGGG CATTGATGTC GTGATACGGC 950 

ACTCATTTTT TGACCATGGA TACAATCACC TCGTGGACCA GAATTTTAAC 1000 

CCATTACCAG ACTACTGGCT CTCTCTCCTC TACAAGCGCC TGATCGGCCC 1050 

CAAAGTCTTG GCTGTGCATG TGGCTGGGCT CCAGCGGAAG CCACGGCCTG 1100 

GCCGAGTGAT CCGGGACAAA CTAAGGATTT ATGCTCACTG CACAAACCAC 1150 

CACAACCACA ACTACGTTCG TGGGTCCATT ACACTTTTTA TCATCAACTT 1200 

GCATCGATCA AGAAAGAAAA TCAAGCTGGC TGGGACTCTC AGAGACAAGC 1250 

TGGTTCACCA GTACCTGCTG CAGCCCTATG GGCAGGAGGG CCTAAAGTCC 1300 

AAGTCAGTGC AACTGAATGG CCAGCCCTTA GTGATGGTGG ACGACGGGAC 1350 

CCTCCCAGAA TTGAAGCCCC GCCCCCTTCG GGCCGGCCGG ACATTGGTCA 1400 

TCCCTCCAGT CACCATGGGC TTTTTTGTGG TCAAGAATGT CAATGCTTTG 1450 

GCCTGCCGCT ACCGATAAGC TATCCTCACA CTCATGGCTA CCAGTGGGCC 1500 

TGCTGGGCTG CTTCCACTCC TCCACTCCAG TAGTATCCTC TGTTTTCAGA 1550 

CATCCTAGCA ACCAGCCCCT GCTGCCCCAT CCTGCTGGAA TCAACACAGA 1600 

CTTGCTCTCC AAAGAGACTA AATGTCATAG CGTGATCTTA GCCTAGGTAG 1650 

GCCACATCCA TCCCAAAGGA AAATGTAGAC ATCACCTGTA CCTATATAAG 17 00 

GATAAAGGCA TGTGTATAGA GCAA 1724 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 80 

(B) TYPE: amino acid 

(C) STFIANDEDNESS : single 









(D) 


TOPOLOGY : 




linear 












(xi) 


SEQUENCE 


DESCRIPTION 


: SEQ ID 


1 NO: 


7: 






Met 


Arg 


Val 


Leu 


Cys 


Ala 


Phe 


Pro 


Glu 


Ala 


Met 


Pro 


Ser 


Ser 


Asn 










5 










10 










15 


Ser 


Arg 


Pro 


Pro 


Ala 


Cys 


Leu 


Ala 


Pro 


Gly 


Ala 


Leu 


Tyr 


Leu 


Ala 










20 










25 








30 


Leu 


Leu 


Leu 


His 


Leu 


Ser 


Leu 


Ser 


Ser 


Gin 


Ala 


Gly 


Asp 


Arg 


Arg 










35 










40 








45 


Pro 


Leu 


Pro 


val 


Asp 


Arg 


Ala 


Ala 


Gly 


Leu 


Lys 


Glu 


Lys 


Thr 


Leu 










50 










55 










60 


lie 


Leu 


Leu 


Asp 


Val 


Ser 


Thr 


Lys 


Asn 


Pro 


Val 


Arg 


Thr 


Val 


Asn 










65 










70 








75 


Glu 


Asn 


Phe 


Leu 


Ser 


Leu 


Gin 


Leu 


Asp 


Pro 


Ser 


He 


He 


His 


Asp 










80 










85 










90 


Gly 


Trp 


Leu 


Asp 


Phe 


Leu 


Ser 


Ser 


Lys 


Arg 


Leu 


Val 


Thr 


Leu 


Ala 










95 










100 










105 


Arg 


Gly 


Leu 


Ser 


Pro 


Ala 


Phe 


Leu 


Arg 


Phe 


Gly 


Gly 


Lys 


Arg 


Thr 










110 










115 










120 


Asp 


Phe 


Leu 


Gin 


Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser 


Arg 










125 










130 










135 


Gly 


Gly 


Pro 


Gly 


Pro 


Asp 


Tyr 


Tyr 


Leu 


Lvs 


Asn 


Tyr 


Glu 


Asp 


Glu 










140 










145 










150 


Pro 


Asn 


Asn 


Tyr 


Arg 


Thr 


Met 


His 


Gl V 


Arg 


Ala 


Val 


Asn 


Gly 


Ser 










155 










160 








165 


Gin 


Leu 


Gly 


Lvs 


Asp 


Tvr 


He 


Gin 


Leu 


Lys 


Ser 


Leu 


Leu 


Gin 


Pro 










170 










175 










180 


He 


Arg 


He 


Tyr 


Ser 


Arg 


Ala 


Ser 


Leu 


Tvr 


Glv 


Pro 


Asn 


He 


Glv 










185 










190 










195 


Arg 


Pro 


Arg 


Lys 


Asn 


Val 


He 


Ala 


Leu 


Leu 


Asp 


Glv 


Phe 


Met 


Lys 










200 










205 










210 


val 


Ala 


Gly 


Ser 


Thr 


Val 


Asp 


Ala 


Val 


Thr 


Trp 


Gin 


His 


Cys 


Tyr 










215 










220 










225 


He 


Asp Gly 


Arg 


Val 


Val 


Lys 


Val 


Met 


Asp 


Phe 


Leu 


Lys 


Thr 


Arg 










230 










235 








240 


Leu 


Leu 


Asp 


Thr 


Leu 


Ser 


Asp 


Gin 


He 


Arg 


Lys 


He 


Gin 


Lys 


Val 










245 










250 








255 


Val 


Asn 


Thr 


Tyr 


Thr 


Pro 


Gly 


Lys 


Lys 


He 


Trp 


Leu 


Glu 


Gly 


Val 










260 










265 










270 


Val 


Thr 


Thr 


Ser 


Ala 


Gly 


Gly 


Thr 


Asn 


Asn 


Leu 


Ser 


Asp 


Ser 


Tyr 










275 










280 










285 


Ala 


Ala 


Gly 


Phe 


Leu 


Trp 


Leu 


Asn 


Thr 


Leu 


Gly 


Met 


Leu 


Ala 


Asn 










290 










295 










300 


Gin 


Gly 


He 


Asp 


Val 


Val 


He 


Arg 


His 


Ser 


Phe 


Phe 


Asp 


His 


Gly 










305 










310 








315 


Tyr 


Asn 


His 


Leu 


Val 


Asp 


Gin 


Asn 


Phe 


Asn 


Pro 


Leu 


Pro Asp 


Tyr 










320 










325 










330 


Trp 


Leu 


Ser 


Leu 


Leu 


Tyr 


Lys 


Arg 


Leu 


He 


Gly 


Pro 


Lys 


Val 


Leu 










335 










340 








345 


Ala 


Val 


His 


Val 


Ala 


Gly 


Leu 


Gin 


Arg 


Lys 


Pro 


Arg 


Pro 


Gly 


Arg 










350 










355 










360 


Val 


He 


Arg 


Asp 


Lys 


Leu 


Arg 


He 


Tyr 


Ala 


His 


Cys 


Thr 


Asn 


His 










365 










370 










375 


His 


Asn 


His 


Asn 


Tyr 


Val 


Arg 


Gly 


Ser 


He 


Thr 


Leu 


Phe 


He 


He 










380 










385 










390 


Asn 


Leu 


His 


Arg 


Ser 


Arg 


Lys 


Lys 


He 


Lys 


Leu 


Ala 


Gly 


Thr 


Leu 










395 










400 








405 


Arg 


Asp 


Lys 


Leu 


Val 


His 


Gin 


Tyr 


Leu 


Leu 


Gin 


Pro 


Tyr 


Gly Gin 










410 










415 








420 


Glu 


Gly 


Leu 


Lys 


Ser 


Lys 


Ser 


Val 


Gin 


Leu 


Asn 


Gly Gin 


Pro 


Leu 










425 










430 










435 


Val 


Met 


Val 


Asp 


Asp 


Gly 


Thr 


Leu 


Pro 


Glu 


Leu 


Lys 


Pro 


Arg 


Pro 










440 










445 










450 


Leu 


Arg 


Ala 


Gly 


Arg 


Thr 


Leu 


Val 


He 


Pro 


Pro 


Val 


Thr 


Met 


Gly 










455 










460 










465 


Phe 


Phe 


Val 


Val 


Lys 


Asn 


Val 


Asn 


Ala 


Leu 


Ala 


Cys 


Arg 


Tyr 


Arg 










470 










475 










480 



(2) INFORMATION FOR SEQ ID NO: 8: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 351 

(B) TYPE: amino acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
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GTTCGGCAGA GGATCATGTC TGATGTACAG AGACATTGTC CGGAGTGATG 50 
TTGCCTTGGA CAAGCAGAAA GGCTGTAAGA TTGGCCAGCA CCCTGATGTC 100 

ATGCTGGAGC TCCAGAGAGA GAAGGCATCC AGACTGTCTG GTTCTTCTGA 150 
AGGAGCAATA CTCCAATACT TACAGT7U\CC TCATATTAAC AGGTCTCTAG 200 
ACAAACTTTA TAACTTTGCT GATTGCTCTG GACTCCACCT GATATTTGCT 2 50 
CTAAATGCAC TGCGTCGTAA TCCCAATAAC TCCTGGAACA GTTCTAGTGC 300 
CCTGAGCCTG TTGAAGTACA GTGCCAGCAA AAAGTACAAC ATTTCTTGGG 350 
A 351 

(2) INFORMATION FOR SEQ ID NO; 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 543 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Mo V 

we L. 


Leu 


Leu 


Arg 




Lys 


Pro 


Ala 


Leu 




Pro 


Pro 




Met 




Leu 


















10 










15 




eu 


eu 


Gly 


Pro 


Leu 


Gly 


Pro 


eu 


Ser 


Pro 


Gly 


Ala 




Pro 


Arg 


Pro 








20 


























i\X 3k 


Gin 


Ala 


Gin 


Asp 


Val 


Val 




Leu 


Asp 


Phe 


Phe 


Thr 


Gin 


Glu 


fx 0 






35 










40 


















Leu 


His 


Leu 


val 


Ser 


Pro 


Ser 


Phe 


Leu 


oer 


Val 


Thr 




Asp 


Ala 






50 










55 










60 










Leu 


Ala 


Thr 


Asp 


Pro 


Arg 


Phe 


Leu 


±xe 


Leu 


Leu Gly 


oer 


<rro 


T ire 


Leu 


65 








70 










75 










Q n 
oU 


Arg 


Thr 


Leu 


Ala 


Arg 


Gly 


Leu 


Ser 


Pro 


Ala 


Tyr 


Leu 


Arg 


Phe 


Gly 


Gly 








85 










90 










95 




Thr 


Lys 


Thr 


Asp 


Phe 


Leu 


He 


Phe 


Asp 


Pro 


Lys 


Lys 


GXU 


Ser 


Thr 


Phe 








100 










105 










110 






Glu 


Glu 


Arg 


Ser 


Tyr 


Trp 


Gin 


Ser 


L»xn 


va J. 


Asn 


L»±n 


Asp 


T 1 *s 


Cys 


T \T« 

ijys 






115 










ion 










1 ^ D 








Tyr 


Gly 


Ser 


He 


Pro 


Pro 


Asp 


V a J. 


fin 


fin 
u J.U 


Lys 


Leu 


Arg 


eu 


Glu 


Trp 




1 oU 










135 










1 dn 

I.H\J 










Pro 


Tyr 


Gin 


Glu 


VaJ^n 


Leu 


Leu 


Leu 


Arg 




nj.9 


Tyr 




Lys 


ys 


Phe 


145 








150 










155 










1 60 


Lys 


Asn 


Ser 


Thr 


Tyr 


Ser 


Arg 


oer 


Ser 


Val 


Asp 


Val 


eu 


Tyr 


Thr 


Phe 




















170 










175 




Ala 


Asn 


Cys 


Ser 


Gly 


Leu 


Asp 


1. u 
e 


He 


Phe 




Leu 


Asn 


Ala 


Leu 


Leu 






180 






185 










190 






Arg 


Thr 


Ala Asp 


Leu 


Gin Trp 


Asn 


Ser 


Ser 


Asn 


Ala 


Gin 


Leu 


Leu 


Leu 




195 










200 










205 










Tyr 


Cys 


Ser 


Ser 


Lys 


Gly 




Asn 


He 


Ser 




Glu 


Leu 


Gly 


Asn 




210 










215 










220 










Glu 


Pro 


Asn 


Ser 


Phe 


Leu 


Lys 


Lys 


Ala 


Asp 


He 


Phe 


He 


Asn 


Gly 


Ser 


225 










230 










235 










240 


Gin 


Leu 


Gly 


Glu 


Asp 


Tyr 


He 


Gin 


Leu 


His 


Lys 


Leu 


Leu 


Arg 


Lys 


Ser 








245 










250 










255 




Thr 


Phe 


Lys 


Asn 


Ala 


Lys 


Leu 


Tyr 


Gly 


Pro 


Asp 


Val 


Gly 


Gin 


Pro 


Arg 








260 










265 










270 






Arg 


Lys 


Thr 


Ala 


Lys 


Met 


Leu 


Lys 


Ser 


Phe 


Leu 


Lys 


Ala 


Gly 


Gly 


Glu 






275 










280 










285 








Val 


He 


Asp 


Ser 


val 


Thr 


Trp 


His 


His 


Tyr 


Tyr 


Leu 


Asn 


Gly 


Arg 


Thr 




290 










295 










300 










Ala 


Thr 


Arg 


Glu 


Asp 


Phe 


Leu 


Asn 


Pro 


Asp 


Val 


Leu 


Asp 


He 


Phe 


He 


305 








310 










315 










320 


Ser 


Ser 


Val 


Gin 


Lys 


val 


Phe 


Gin 


Val 


Val 


Glu 


Ser 


Thr 


Arg 


Pro 


Gly 










325 










330 










335 




Lys 


Lys 


Val 


Trp 


Leu 


Gly 


Glu 


Thr 


Ser 


Ser 


Ala 


Tyr 


Gly 


Gly 


Gly 


Ala 








340 










345 










350 






Pro 


Leu 


Leu 


Ser 


Asp 


Thr 


Phe 


Ala 


Ala 


Gly 


Phe 


Met 


Trp 


Leu 


Asp 


Lys 






355 








360 










365 








Leu 


Gly 


Leu 


Ser 


Ala 


Arg Met 


Gly 


He 


Glu 


Val 


val 


Met 


Arg 


Gin 


Val 




370 










375 










380 










Phe 


Phe 


Gly Ala 


Gly 


Asn 


Tyr 


His 


Leu 


Val 


Asp 


Glu 


Asn 


Phe 


Asp 


Pro 


385 










390 










395 










400 


Leu 


Pro Asp 


Tyr 


Trp 


Leu 


Ser 


Leu 


Leu 


Phe 


Lys 


Lys 


Leu 


val 


Gly Thr 










405 










410 










415 




Lys 


Val 


Leu 


Met 


Ala 


Ser 


Val 


Gin 


Gly 


Ser 


Lys 


Arg 


Arg 


Lys 


Leu 


Arg 








420 








425 








430 




Val 


Tyr 


Leu 


His 


Cys 


Thr 


Asn 


Thr 


Asp 


Asn 


Pro 


Arg 


Tyr 


Lys 


Glu 


Gly 






435 










440 










445 








Asp 


Leu 


Thr 


Leu 


Tyr 


Ala 


He 


Asn 


Leu 


His 


Asn 


Val 


Thr 


Lys 


Tyr 


Leu 


450 










455 










460 










Arg 


Leu 


Pro 


Tyr 


Pro 


Phe 


Ser 


Asn 


Lys 


Gin 


Val 


Asp 


Lys 


Tyr 


Leu 


Leu 


465 










470 










475 










480 


Arg 


Pro 


Leu 


Gly 


Pro 


His 


Gly 


Leu 


Leu 


Ser 


Lys 


Ser 


Val 


Gin 


Leu 


Asn 








485 










490 










495 




Gly 


Leu 


Thr 


Leu 


Lys 


Met 


Val 


Asp 


Asp 


Gin 


Thr 


Leu 


Pro 


Pro 


Leu 


Met 






500 








505 










510 
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Glu Lys Pro Leu Arg Pro Gly Ser Ser Leu Gly Leu Pro Ala Phe Se 

515 520 525 

Tyr Ser Phe Phe Val lie Arg Asn Ala Lys Val Ala Ala Cys lie 

530 535 540 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGAGAGCAAG TCTGTGTTGA TTC 23 

(2) INFORiylATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CACTGGTAGC CATGAGTGTG AG 22 

(2) INFORMATION FOR SEQ ID NO: 12: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TTGGTCATCC CTCCAGTCAC CA 22 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 

Asp Glu 



<2) INFORMATION FOR SEQ ID NO: 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CTTGCCTGTA GACAGAGCTG CAG 23 



(2) 



INFORMATION FOR SEQ ID NO: 15: 
(i) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2396 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 

TTTCTAGTTG CTTTTAGCCA ATGTCGGATC AGGTTTTTCA AGCGACAAAG 50 

AGATACTGAG ATCCTGGGCA GAGGACATCC TAGCTCGGTC AGATTTGGGC 100 

AGGCTCAAGT GACCAGTGTC TTAAGGCAGA AGGGAGTCGG GGTAGGGTCT 150 

GGCTGAACCC TCAACCGGGG CTTTTAACTC AGGGTCTAGT CCTGGCGCCA 2 00 

AATGGATGGG ACCTAGAAAA GGTGACAGAG TGCGCAGGAC ACCAGGAAGC 2 50 

TGGTCCCACC CCTGCGCGGC TCCCGGGCGC TCCCTCCCCA GGCCTCCGAG 300 

GATCTTGGAT TCTGGCCACC TCCGCACCCT TTGGATGGGT GTGGATGATT 350 

TCAAAAGTGG ACGTGACCGC GGCGGAGGGG AAAGCCAGCA CGG/UU^TGAA 400 

AGAGAGCGAG GAGGGGAGGG CGGGGAGGGG AGGGCGCTAG GGAGGGACTC 4 50 

CCGGGAGGGG TGGGAGGGAT GGAGCGCTGT GGGAGGGTAC TGAGTCCTGG 500 

CGCCAGAGGC GAAGCAGGAC CGGTTGCAGG GGGCTTGAGC CAGCGCGCCG 550 

GCTGCCCCAG CTCTCCCGGC AGCGGGCGGT CCAGCCAGGT GGGATGCTGA 600 

GGCTGCTGCT GCTGTGGCTC TGGGGGCCGC TCGGTGCCCT GGCCCAGGGC 650 

GCCCCCGCGG GGACCGCGCC GACCGACGAC GTGGTAGACT TGGAGTTTTA 700 

CACCAAGCGG CCGCTCCGAA GCGTGAGTCC CTCGTTCCTG TCCATCACCA 750 

TCGACGCCAG CCTGGCCACC GACCCGCGCT TCCTCACCTT CCTGGGCTCT 800 

CCAAGGCTCC GTGCTCTGGC TAGAGGCTTA TCTCCTGCAT ACTTGAGATT 850 

TGGCGGCACA AAGACTGACT TCCTTATTTT TGATCCGGAC AAGGAACCGA 900 

CTTCCGAAGA AAGAAGTTAC TGGAAATCTC AAGTCAACCA TGATATTTGC 950 

AGGTCTGAGC CGGTCTCTGC TGCGGTGTTG AGGAAACTCC AGGTGGAATG 1000 

GCCCTTCCAG GAGCTGTTGC TGCTCCGAGA GCAGTACCAA AAGGAGTTCA 1050 
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AGAACAGCAC CTACTCAAGA AGCTCAGTGG ACATGCTCTA CAGTTTTGCC 1100 

AAGTGCTCGG GGTTAGACCT GATCTTTGGT CTAAATGCGT TACTACGAAC 1150 

CCCAGACTTA CGGTGGAACA GcTCCAACGC CCAGCTTCTC CTTGACTACT 1200 

GCTCTTCCAA GGGTTATAAC ATcTCCTGGG AACTGGGCAA TGAGCCCAAC 1250 

AGTTTCTGGA AGAAAGCTCA CATTCTCATC GATGGGTTGC AGTTAGGAGA 1300 

AGACTTTGTG GAGTTGCATA AACTTCTACA AAGGTCAGCT TTCCAAAATG 1350 

CAAAACTCTA TGGTCCTGAC ATCGGTCAGC CTCGAGGGAA GACAGTTAAA 1400 

CTGCTGAGGA GTTTCCTGAA GGCTGGCGGA GAAGTGATCG ACTCTCTTAC 1450 

ATGGCATCAC TATTACTTGA ATGGACGCAT CGCTACCAAA GAAGATTTTC 1500 

TGAGCTCTGA TGCGCTGGAC ACTTTTATTC TCTCTGTGCA AAAAATTCTG 1550 

AAGGTCACTA AAGAGATCAC ACCTGGCAAG AAGGTCTGGT TGGGAGAGAC 1600 

GAGCTCAGCT TACGGTGGCG GTGCACCCTT GCTGTCCAAC ACCTTTGCAG 1650 

CTGGCTTTAT GTGGCTGGAT AAATTGGGCC TGTCAGCCCA GATGGGCATA 1700 

GAAGTCGTGA TGAGGCAGGT GTTCTTCGGA GCAGGCAACT ACCACTTAGT 1750 

GGATGAAAAC TTTGAGCCTT TACCTGATTA CTGGCTCTCT CTTCTGTTCA 1800 

AGAAACTGGT AGGTCCCAGG GTGTTACTGT CAAGAGTGAA AGGCCCAGAC 1850 

AGGAGCAAAC TCCGAGTGTA TCTCCACTGC ACTAACGTCT ATCACCCACG 1900 

ATATCAGGAA GGAGATCTAA CTCTGTATGT CCTGAACCTC CATAATGTCA 1950 

CCAAGCACTT GAAGGTACCG CCTCCGTTGT TCAGGAAACC AGTGGATACG 2000 

TACCTTCTGA AGCCTTCGGG GCCGGATGGA TTACTTTCCA AATCTGTCCA 2050 

ACTGAACGGT CAAATTCTGA AGATGGTGGA TGAGCAGACC CTGCCAGCTT 2100 

TGACAGAAAA ACCTCTCCCC GCAGGAAGTG CACTAAGCCT GCCTGCCTTT 2150 

TCCTATGGTT TTTTTGTCAT AAGAAATGCC AAAATCGCTG CTTGTATATG 2200 

AAAATAAAAG GCATACGGTA CCCCTGAGAC AAAAGCCGAG GGGGGTGTTA 2250 

TTCATAAAAC AAAACCCTAG TTTAGGAGGC CACCTCCTTG CCGAGTTCCA 2 30 0 

GAGCTTCGGG AGGGTGGGGT ACACTTCAGT ATTACATTCA GTGTGGTGTT 2350 

CTCTCTAAGA AGAATACTGC AGGTGGTGAC AGTTAATAGC ACTGTG 2396 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRAW DEDNES S : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GAGCAGCCAG GTGAGCCCAA GA 22 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 : 
TCAGATGCAA GCAGCAACTT TGGC 24 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

<B) TYPE: nucleic acid 

<C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CACCCTGATG TCATGCTGGA G 21 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CATCTAGGAG AGCAATGACG TTC 23 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
CCATCCTAAT ACGACTCACT ATAGGGC 27 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: 
ACTCACTATA GGGCTCGAGC GGC 23 



11 

SEQ ID NO:21: 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



(2) 



INFORMATION FOR SEQ ID NO: 23: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS: 
TOPOLOGY : 



560 

nucleic acid 

double 

linear 

SEQ ID NO: 23 



SEQUENCE DESCRIPTION: 

GGCACGAGGC TAGTGGAGAG ACTGACAAGC AGTCAGCTCA GCGGTCACAA 50 

TACTGTGTGA CAGGAGCTGA GATCCAAGAA GTACTGGGTC CTGTGGGAGC 100 

ACCCCTGACT TGAAGGACAA GTCAGTGCAA CTGAATGGCC AGCCCTTAGT 150 

GATGGTGGAC GACGGGACCC TCCCAGAATT GAAGCCCCGC CCCCTTCGGG 200 

CCGGCCGGAC ATTGGTCATC CCTCCAGTCA CCATGGGCTT TTTTGTGGTC 250 

AAGAATGTCA ATGCTTTGGC CTGCCGCTAC CGATAAGCTA TCCTCACACT 300 

CATGGCTACC AGTGGGCCTG CTGGGCTGCT TCCACTCCTC CACTCCAGTA 350 

GTATCCTCTG TTTTCAGACA TCCTAGCAAC CAGCCCCTGC TGCCCCATCC 400 

TGCTGGAATC AACACA6ACT TGCTCTCCAA AGAGACTAAA TGTCATAGCG 450 

TGATCTTAGC CTAGGTAGGC CACATCCATC CCAAAGGAAA ATGTAGACAT 500 

CACCTGTACC TATATAAGGA TAAAGGCATG TGTATAGAGC AAAAAAAAAA 550 

AAAAAAAAAA 560 



(2) 



INFORMATION FOR SEQ ID NO:24: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 

(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



1721 

nucleic acid 

double 

linear 

SEQ ID NO:24: 



SEQUENCE DESCRIPTION: 
CTAGAGCTTT CGACTCTCCG CTGCGCGGCA GCTGGCGGGG GGAGCAGCCA 
AGATGCTGCT GCGCTCGAAG CCTGCGCTGC CGCCGCCGCT GATGCTGCTG 
CGCTGGGTCC CCTCTCCCCT GGCGCCCTGC CCCGACCTGC GCAAGCACAG 
ACCTGGACTT cTTCACCCAG GAGCCGCTGC ACCTGGTGAG CCCCTCGTTC 
CCATTGACGC CAACCTGGCC ACGGACCCGC GGTTCCTCAT CCTCCTGGGT 
TTCGTACCTT GGCCAGAGGC TTGTCTCCTG CGTACCTGAG GTTTGGTGGC 
ACTTCCTAAT TTTCGATCCC AAGAAGGAAT CAACCTTTGA AGAGAGAAGT 
CTCAAGTCAA CCAGGATATT TGCAAATATG GATCCATCCC TCCTGATGTG 
TACGGTTGGA ATGGCCCTAC CAGGAGCAAT TGCTACTCCG AGAACACTAC 
TCAAGAACAG CACCTACTCA AGAAGCTCTG TAGATGTGCT ATACACTTTT 
CAGGACTGGA CTTGATCTTT GGCCTAAATG CGTTATTAAG AACAGCAGAT 
ACAGTTCTAA TGCTCAGTTG CTCCTGGACT ACTGCTCTTC CAAGGGGTAT 
GGGAACTAGG CAATGAACCT AACAGTTTCC TTAAGAAGGC TGATATTTTC 
CGCAGTTAGG AGAAGATTAT ATTCAATTGC ATAAACTTCT AAGAAAGTCC 
ATGCAAAACT CTATGGTCCT GATGTTGGTC AGCCTCGAAG AAAGACGGCT 
AGAGCTTCCT GAAGGCTGGT GGAGAAGTGA TTGATTCAGT TACATGGCAT 
TGAATGGACG GACTGCTACC AGGGAAGATT TTCTAAACCC TGATGTATTG 
TTTCATCTGT GCAAAAAGTT TTCCAGGTGG TTGAGAGCAC CAGGCCTGGC 
GGTTAGGAGA AACAAGCTCT GCATATGGAG GCGGAGCGCC CTTGCTATCC 
CAGCTGGCTT TATGTGGCTG GATAAATTGG GCCTGTCAGC CCGAATGGGA 
TGATGAGGCA AGTATTCTTT GGAGCAGGAA ACTACCATTT AGTGGATGAA 
CTTTACCTGA TTATTGGCTA TCTCTTCTGT TCAAGAAATT GGTGGGCACC 
TGGCAAGCGT GCAAGGTTCA AAGAGAAGGA AGCTTCGAGT ATACCTTCAT 
CTGACAATCC AAGGTATAAA GAAGGAGATT TAACTCTGTA TGCCATAAAC 
TCACCAAGTA CTTGCGGTTA CCCTATCCTT TTTCTAACAA GCAAGTGGAT 
TAAGACCTTT GGGACCTCAT GGATTACTTT CCA/IATCTGT CCAACTCAAT 
TAAAGATGGT GGATGATCAA ACCTTGCCAC CTTTAATGGA AAAACCTCTC 
GTTCACTGGG CTTGCCAGCT TTCTCATATA GTTTTTTTGT GATAAGAAAT 
CTGCTTGCAT CTGAAAATAA AATATACTAG TCCTGACACT G 



GGTGAGCCCA 
CTCCTGGGGC 
GACGTCGTGG 
CTGTCCGTCA 
TCTCCAAAGC 
ACCAAGACAG 
TACTGGCAAT 
GAGGAGAAGT 
CAGAAAAAGT 
GCAAACTGCT 
TTGCAGTGGA 
AACATTTCTT 
ATCAATGGGT 
ACCTTCAAAA 
AAGATGCTGA 
CACTACTATT 
GACATTTTTA 
AAGAAGGTCT 
GACACCTTTG 
ATAGAAGTGG 
AACTTCGATC 
AAGGTGTTAA 
TGCACAAACA 
CTCCATAACG 
AAATACCTTC 
GGTCTAACTC 
CGGCCAGGAA 
GCCAAAGTTG 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1721 



(2) 



INFORMATION FOR SEQ ID NO: 25: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



45 

nucleic acid 

single 

linear 

SEQ ID NO:24: 



CTTACTTGTC ATCGTCGTCC TTGTAGTCTC GGTAGCGGCA GGCCA 45 



